the choice of Chainer? - Githubissues

NeuralBricolage commented 6 years ago

HI Christian, i've read your paper and it really makes sense - in fact i've been looking for a similar approach as i'm working on "text recognition in the wild" Currently i have a VGG-like arch in Keras/TF that works but i also have a rough PyTorch version too. My quick question - was there a particular reason for you to go with Chainer vs say PyTorch? i don't know anything about Chainer but it sounds like they are somewhat similar - then i can just port your implementation to PyTorch right? Also any assumptions about the size of input images? Don't see it's mentioned... I've experimented with both Yolo2 and RetinaNet and it's almost like you need to split the image before feeding into the pipeline (resizing is not good - as the text might be pretty small) - there are some implementation that have yet another text/no text classifier... Thanks much! helena

Bartzi commented 6 years ago

Thanks for your interest =)

I chose Chainer because of the following reasons:

it uses a dynamic computational graph, which means that it is easy to prototype new ideas and it is very straight forward to implement (makes it way better than Tensoflow, MXNet and the like) Pytorch also has a dynamic graph, which makes this library cool as well
Chainer is entirely written in Python/Cython and hence does not include any obscure calls to strange backends, making it hard to debug problems in your code
it is very easy to change the behaviour of the library in the way I want, as it adheres to object oriented design principles at nearly every part of its API design (this is something I did not see yet in any other library)
it is very well documented and mainted

PyTorch did not convince me in the same way so far, that is why I like to use Chainer, but I think PyTorch is definitely a good library.

The size assumotion for the input images is hard coded in the train files. The input size is defined by the variable image_size and the cropped size is defined by the variable target_shape.

I think real text detection as could be done with Yolo like networks is not yet possible, as the detection stage needs to be improved for that.

Hope that helps ;)

NeuralBricolage commented 6 years ago

Thank you, that helps a lot! re image input sizes - i see that the default is on a smaller side - wonder if you had a chance of experimenting with text localization from a high-res image ex 1000X1000px

Bartzi commented 6 years ago

No, I did not do any experiments on such high-res images, mainly because it will be difficult to put all data on a GPU... memory usage will explode =(

NeuralBricolage commented 6 years ago

sure, understood

Bartzi / see

the choice of Chainer? #5