Closed NeuralBricolage closed 6 years ago
Thanks for your interest =)
I chose Chainer because of the following reasons:
PyTorch did not convince me in the same way so far, that is why I like to use Chainer, but I think PyTorch is definitely a good library.
The size assumotion for the input images is hard coded in the train files. The input size is defined by the variable image_size
and the cropped size is defined by the variable target_shape
.
I think real text detection as could be done with Yolo like networks is not yet possible, as the detection stage needs to be improved for that.
Hope that helps ;)
Thank you, that helps a lot! re image input sizes - i see that the default is on a smaller side - wonder if you had a chance of experimenting with text localization from a high-res image ex 1000X1000px
No, I did not do any experiments on such high-res images, mainly because it will be difficult to put all data on a GPU... memory usage will explode =(
sure, understood
HI Christian, i've read your paper and it really makes sense - in fact i've been looking for a similar approach as i'm working on "text recognition in the wild" Currently i have a VGG-like arch in Keras/TF that works but i also have a rough PyTorch version too. My quick question - was there a particular reason for you to go with Chainer vs say PyTorch? i don't know anything about Chainer but it sounds like they are somewhat similar - then i can just port your implementation to PyTorch right? Also any assumptions about the size of input images? Don't see it's mentioned... I've experimented with both Yolo2 and RetinaNet and it's almost like you need to split the image before feeding into the pipeline (resizing is not good - as the text might be pretty small) - there are some implementation that have yet another text/no text classifier... Thanks much! helena