Bartzi / see

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"
GNU General Public License v3.0
574 stars 147 forks source link

Size error in fsns demo #48

Closed vaibhav541 closed 6 years ago

vaibhav541 commented 6 years ago

I was trying to run fsns_demo on a random downloaded image but got this error.

Traceback (most recent call last): File "fsns_demo.py", line 153, in predictions, crops, grids = network(image[xp.newaxis, ...]) File "/home/nandwani_vaibhav/text-detection-ctpn/see/chainer/datasets/fsns.py", line 521, in call h = self.localization_net(images) File "/home/nandwani_vaibhav/text-detection-ctpn/see/chainer/datasets/fsns.py", line 206, in call lstm_prediction = F.relu(self.lstm(in_feature)) File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/links/connection/lstm.py", line 309, in call lstm_in = self.upward(x) File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/links/connection/linear.py", line 129, in call return linear.linear(x, self.W, self.b) File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/functions/connection/linear.py", line 118, in linear y, = LinearFunction().apply(args) File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/function_node.py", line 230, in apply self._check_data_type_forward(in_data) File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/function_node.py", line 298, in _check_data_type_forward self.check_type_forward(in_type) File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/functions/connection/linear.py", line 20, in check_type_forward x_type.shape[1] == w_type.shape[1], File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/utils/type_check.py", line 524, in expect expr.expect() File "/home/nandwani_vaibhav/anaconda3/envs/fastai/lib/python3.6/site-packages/chainer/utils/type_check.py", line 482, in expect '{0} {1} {2}'.format(left, self.inv, right)) chainer.utils.type_check.InvalidType: Invalid operation is performed in: LinearFunction (Forward)

Expect: in_types[0].shape[1] == in_types[1].shape[1] Actual: 18144 != 3072

Is there any specific input size of image we should use? Or how to resolve this error?

Bartzi commented 6 years ago

Yes, the input size should be 600 x 150. A typical FSNS image include 4 views of the same street name sign, each view is 150 x 150 pixels in dimension.

vaibhav541 commented 6 years ago

Actually i want to use it to detect text on daily life products like grocery items. So can i use just a single view of that product? Thanks for helping

vaibhav541 commented 6 years ago

Also i would like to know if i am using the right model for my purpose. And if yes, how can i see the detected text , as i am only able to see bounding boxes as a result

Bartzi commented 6 years ago

If you want to use only one image, the FSNS model is not the model you are looking for, in fact there is no pre-trained model that matches your purpose. You'll need to develop your own.

If you want to see the predicted bbox on the image, you'll need to take the predicted bboxes and render them on the image yourself ;) should not be too difficult.

vaibhav541 commented 6 years ago

ohh okay, thanks for your help 👍

santoshmo commented 6 years ago

Since some datasets only have a single view of the image, would concatenating the same image four times horizontally and stretching it to match the 600x150 dimension of FSNS images still make for reasonable training data for the FSNS model?

Bartzi commented 6 years ago

Nope, it doesnt't make sense. You would extract the same features four times and concatenate the same features, so you will not gain any improvements.