Bartzi / see

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"
GNU General Public License v3.0
574 stars 147 forks source link

fsns demo problem #56

Closed Sero8139 closed 5 years ago

Sero8139 commented 5 years ago

Hi , @Bartzi I have a idea that i want to read a video stream to demo the fsns model. but i encounter some problem . if i use a normal picture , not like fsns dataset which has four grid pictures , the demo result will crash.

can you give me some tips , how to train the model use normal picture , or use the fsns model which you provided to fit a normal picture.

and the command and result are following below:

root@c26c18a4e2f0:/app/chainer# python3 fsns_demo.py ../models/fsns/model/ model_35000.npz ../text_detection/img_train/img_124.png ../datasets/fsns/fsns_char_map.json Traceback (most recent call last): File "fsns_demo.py", line 154, in predictions, crops, grids = network(image[xp.newaxis, ...]) File "/app/models/fsns/model/fsns.py", line 516, in call images = F.reshape(images, (batch_size, num_channels, height, 4, -1)) File "/usr/local/lib/python3.5/dist-packages/chainer/functions/array/reshape.py", line 98, in reshape y, = Reshape(shape).apply((x,)) File "/usr/local/lib/python3.5/dist-packages/chainer/function_node.py", line 230, in apply self._check_data_type_forward(in_data) File "/usr/local/lib/python3.5/dist-packages/chainer/function_node.py", line 298, in _check_data_type_forward self.check_type_forward(in_type) File "/usr/local/lib/python3.5/dist-packages/chainer/functions/array/reshape.py", line 40, in check_type_forward type_check.prod(x_type.shape) % size_var == 0) File "/usr/local/lib/python3.5/dist-packages/chainer/utils/type_check.py", line 524, in expect expr.expect() File "/usr/local/lib/python3.5/dist-packages/chainer/utils/type_check.py", line 482, in expect '{0} {1} {2}'.format(left, self.inv, right)) chainer.utils.type_check.InvalidType: Invalid operation is performed in: Reshape (Forward) Expect: prod(in_types[0].shape) % known_size(=2592) == 0 Actual: 1296 != 0

img_124

and I resize the img_124.png to 150x150 , and concat the same images , named img_124_concat.png.

root@c26c18a4e2f0:/app/chainer# python3 fsns_demo.py ../models/fsns/model/ model_35000.npz ../text_detection/img_concat/img_124.png ../datasets/fsns/fsns_char_map.json OrderedDict([('rr', OrderedDict([('bottom_right', (76.43720245361328, 76.42327880859375)), ('top_left', (52.23401641845703, 67.68933868408203))])), ('Rue', OrderedDict([('bottom_right', (76.43720245361328, 76.42327880859375)), ('top_left', (52.23401641845703, 67.68933868408203))])), ('', OrderedDict([('bottom_right', (76.43720245361328, 76.42327880859375)), ('top_left', (52.23401641845703, 67.68933868408203))]))])

img_124_concat

thx

Bartzi commented 5 years ago

You won't have any chance in succeeding with the FSNS model, as it is trained on a completely different dataset and does not generealize well to other data manifolds. You could try to go with a newly trained model based on the SVHN ideas, but even that won't be easy.

So I'm not 100% sure if this approach as it is right now is the right apporach for your problem... sry