fsns-demo doesn't work well

Bartzi / see

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

GNU General Public License v3.0

574 stars 147 forks source link

fsns-demo doesn't work well #52

Open chunhui999 opened 5 years ago

chunhui999 commented 5 years ago

@Bartzi Hi, I ran the fsns_demo.py using my pictures，but it doesn't work well.The result identified the wrong word and the incorrect bboxs. The process I made my pictures as follows: 1) I choose some goods have english words,and then take photos from 4 views(focus on the words); 2) resize the 4 pictures to 150150,then combine them horizontally,create a new picture img（600150）; 3) the new picture img will be the input for running fsns_demo.py I wanna to know whether my idea of processing pictures is wrong？Please give me some advice.

Bartzi commented 5 years ago

It should not come as a surprise to you that the model trained on the FSNS dataset does not work well on other data. This is a common problem in Deep Learning...

How do your images lokk like? Are they quite similar to the original FSNS images? If not, you can not expect the pre-trained model to work well! The prcess you describe seems to be correct.

chunhui999 commented 5 years ago

It looks like this，these two pictures are just for a small test, I have not prepared a real dataset yet. the blue bounding box is the ditection result, you can ignore them. 0_withbbox 1_withbbox

chunhui999 commented 5 years ago

@Bartzi I understand what you mean，if I wonder it works well, I must fine-tuning and retrain the pre-trained model using my own dataset, right?

Bartzi commented 5 years ago

Hi,

the first picture should not work, but the second should actually... did you have a look at the some feature visualization? YOu can do this with Visual Backprop. IN order to enable this, youll have to add the lines before the call to render_rois from this file to this file and then it should work (hopefully) and you can see a feature visualization of what regions of the image excite the localization network the most. Maybe that helps to debug why the secodn image does not work.