argman / EAST

A tensorflow implementation of EAST text detector
GNU General Public License v3.0
3.02k stars 1.05k forks source link

Suggestion How can text recognition (classification) be performed in an end-to-end manner #98

Open QaisarRajput opened 6 years ago

QaisarRajput commented 6 years ago

I have been using the model for form label field localization. i.e ("Name", "Address"). The great thing about the pipeline is that it localizes the fields terrifically. but the output detected regions wont tell what field it is.

  1. One solution could be to use crnn but that recognizes text like "OCR" so there can be miss-recognition as sometime regions are not that precise and might miss part of the text . plus it is not part of the pipeline hence not suitable.
  2. Second solution could be to train separate models for each field. which for obvious reasons is not an intelligent approach (what if some new field comes or some image variation).
  3. Third solution which i think must be the optimal one, is to classify the regions. this intuition is based on the score_map layer [1x1 1]. Output from the model has scores which corresponds to each geometry (region). What if we can add another layer parallel to that with [1x1 n] where n is the number of classes and activation would be softmax. This approach can be part of the pipeline and hence one model would be sufficient to detect multiple fields from a form

Please guide me how this can be done correctly as i am not sure how this new addition will work with score_map and geo_map (at train time). how can this add up in the loss function. If there is a better approach than this. please suggest. i know this is out of scope for this repo. but i think this can be a great use case.

zxytim commented 6 years ago

I'd prefer the third one.

On Thu, Jan 25, 2018 at 6:50 PM QaisarRajput notifications@github.com wrote:

I have been using the model for form label field localization. i.e ("Name", "Address"). The great thing about the pipeline is that it localizes the fields terrifically. but the output detected regions wont tell what field it is.

  1. One solution could be to use crnn but that recognizes text like "OCR" so there can be miss-recognition as sometime regions are not that precise and might miss part of the text . plus it is not part of the pipeline hence not suitable.
  2. Second solution could be to train separate models for each field. which for obvious reasons is not an intelligent approach (what if some new field comes or some image variation).
  3. Third solution which i think must be the optimal one, is to classify the regions. this intuition is based one the score_map layer [1x1 1]. Output from the model has scores which corresponds to each geometry (region). What if we can add another layer parallel to that with [1x1 n] where n is the number of classes and activation would be softmax. This approach can be part of the pipeline and hence one model would be sufficient to detect multiple fields form a form

Please guide me how this can be done correctly as i am not sure how this new addition will work with score_map and geo_map (at train time). how can this add up in the loss function. If there is a better approach than this. please suggest. i know this is out of scope for this repo. but i think this can be a great use case.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/argman/EAST/issues/98, or mute the thread https://github.com/notifications/unsubscribe-auth/AA2QT36w_ogMMWbWAqLoslIAd0R2Y7j3ks5tOFx7gaJpZM4RsqZG .