emedvedev / attention-ocr

A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.
MIT License
1.08k stars 256 forks source link

Option to add custom character vocabulary #159

Closed ashwath98 closed 4 years ago

ashwath98 commented 4 years ago

I think that it will be useful to have an additional parameter where the user can add their own character set for the model.

I think this can be done with some changes to the data_gen code to take characters from a file, and a custom_vocab parameter.

Do you think I should go ahead with implementing this?

emedvedev commented 4 years ago

Of course! I’ll gladly accept a PR. If you look into closed issues, we did have people training with non-standard charsets, and they still might have code patches left. On Jan 4, 2020, 09:06 +0100, Ashwath Shetty notifications@github.com, wrote:

I think that it will be useful to have an additional parameter where the user can add their own character set for the model. I think this can be done with some changes to the data_gen code to take characters from a file, and a custom_vocab parameter. Do you think I should go ahead with implementing this? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ashwath98 commented 4 years ago

Great!! Thanks