Bartzi / see

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"
GNU General Public License v3.0
574 stars 147 forks source link

the choice of Chainer? #5

Closed NeuralBricolage closed 6 years ago

NeuralBricolage commented 6 years ago

HI Christian, i've read your paper and it really makes sense - in fact i've been looking for a similar approach as i'm working on "text recognition in the wild" Currently i have a VGG-like arch in Keras/TF that works but i also have a rough PyTorch version too. My quick question - was there a particular reason for you to go with Chainer vs say PyTorch? i don't know anything about Chainer but it sounds like they are somewhat similar - then i can just port your implementation to PyTorch right? Also any assumptions about the size of input images? Don't see it's mentioned... I've experimented with both Yolo2 and RetinaNet and it's almost like you need to split the image before feeding into the pipeline (resizing is not good - as the text might be pretty small) - there are some implementation that have yet another text/no text classifier... Thanks much! helena

Bartzi commented 6 years ago

Thanks for your interest =)

I chose Chainer because of the following reasons:

PyTorch did not convince me in the same way so far, that is why I like to use Chainer, but I think PyTorch is definitely a good library.

The size assumotion for the input images is hard coded in the train files. The input size is defined by the variable image_size and the cropped size is defined by the variable target_shape.

I think real text detection as could be done with Yolo like networks is not yet possible, as the detection stage needs to be improved for that.

Hope that helps ;)

NeuralBricolage commented 6 years ago

Thank you, that helps a lot! re image input sizes - i see that the default is on a smaller side - wonder if you had a chance of experimenting with text localization from a high-res image ex 1000X1000px

Bartzi commented 6 years ago

No, I did not do any experiments on such high-res images, mainly because it will be difficult to put all data on a GPU... memory usage will explode =(

NeuralBricolage commented 6 years ago

sure, understood