MichalBusta / E2E-MLT

E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text
MIT License
291 stars 84 forks source link

Function of ocr_feed_list #12

Closed pinakinathc closed 5 years ago

pinakinathc commented 5 years ago

hi @MichalBusta I am sorry for this naive question.

I want to fine tune your model e2e-mlt.h5 with my dataset. For this, I have various images along with the ground truth of the texts in the image.

Now, the train.py has 2 parameters:

-train_list: points to the directory where you have images along with their gt -ocr_feed_list: points to the directory where you have cropped words

Is having cropped words mandatory for training? And is there any way to train the model without using the cropped word images (i.e. only using Scene Images with gt of all the text in it).

My priority is not only to achieve better text localisation but also a better text recognition, hence the OCR branch needs to be trained too, but using the gt of text present in the Scene Image and not separate cropped word images.

MichalBusta commented 5 years ago

Hi Nath,

On 12/12/2018 17:27, Pinaki Nath Chowdhury wrote:

hi @MichalBusta https://github.com/MichalBusta I am sorry for this naive question.

I want to fine tune your model |e2e-mlt.h5| with my dataset. For this, I have various images along with the ground truth of the texts in the image.

Now, the |train.py| has 2 parameters:

  • -train_list
  • -ocr_feed_list

-train_list: points to the directory where you have images along with their gt -ocr_feed_list: points to the directory where you have cropped words

Is having cropped words mandatory for training? And is there any way to train the model without using the cropped word images (i.e. only using Scene Images with gt of all the text in it).

there is good reason to use cropped images:

 - you can form a large batches for ocr, so the training is much faster

 - the ocr training on samples provided by detector, you can see as extra augmentation of data (training from scratch just on these samples may be to hard, but just for fine-tuning I have no experince, so you can try ... )

 - it is easy to generate synthetic crops to reflect your data.

My priority is not only to achieve better text localisation but also a better text recognition, hence the OCR branch needs to be trained too, but using the gt of text present in the Scene Image and not separate cropped word images.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MichalBusta/E2E-MLT/issues/12, or mute the thread https://github.com/notifications/unsubscribe-auth/AD6jsNTtd-FgLtghUDdRCSZl0aAMdqmhks5u4S5pgaJpZM4ZP4Nb.