"validate" option should use charList from model

githubharald / SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.

https://towardsdatascience.com/2326a3487cd5

MIT License

1.99k stars 893 forks source link

"validate" option should use charList from model #127

Closed atsju closed 2 years ago

atsju commented 2 years ago

https://github.com/githubharald/SimpleHTR/blob/7f26b321f8b8b18e5f60cbc1b7d5e1ad202e7487/src/main.py#L177

From my point of view validate should use char_list from the model instead from the loaded dataset. This way if the user uses a different dataset for validation it will work. In current implmentation, the charList depends of dataset and will create errors if the new dataset has not exactly same charlist as the learning dataset.

githubharald commented 2 years ago

makes sense 👍 Feel free to provide a PR with the change. Otherwise I'll change this, but might take some time.

atsju commented 2 years ago

I did ugly chnage on my side. It's not very clean.

Maybe consider also something to be able to add characters during (for example) transfert learning. I try to recognize german text from only 1 person but do not have much data so I learned on the IAM dataset and did transfert learning on some part of my German text. Unfortunatelly it will fait if I feed ä or ü because it's not in the original dataset. I didn't have a deep look for the moment on how to improve it as I still have to extend my GT data a bit.

Do not be surpised if I open many Issues, I just try to document the little things I see during my own learning process.

githubharald commented 2 years ago

ok, as I said this one I'll implement in the future, but for the rest let's see. As the name SimpleHTR suggests I want to keep it as simple as possible ;-) ... even if this means lacking some features. The repo should be considered as a foundation for further developments, on which everyone can build his or her custom stuff.

githubharald commented 2 years ago

changed the code, now in validation and inference mode the charset of the trained model is used.

YangXiaoliang213 commented 2 years ago

Hi, I wonder if this can train the text line model, it seems that it can only train the single word model

githubharald commented 2 years ago

I don't understand your question. Is this related to this issue? If so please explain in more detail.

pushpaharika3 commented 2 years ago

How to improve accuracy?

RayAd4 commented 1 month ago

after two years, im led to you guys. Great work Harald! whats the latest on this program?