Closed PellelNitram closed 3 years ago
Same problem here. Did wordbeamsearch improve your performance?
Hi,
that is a typical issue encountered with systems that are trained (on data). The model learns to read the text in the training set. If training set and the test set are completely different, then such a model can break down.
The model is trained on the IAM dataset, some things that look different than IAM in your test image:
Changing the image accordingly gives the following output:
Recognized: "or work on line level" Probability: 0.21756145358085632
With neural networks, having large amount of data with great variability is the key. So if you would like to have a model that better generalizes, you would have to work on extending the dataset and also work on the data augmentation implementation.
Great, thank you very much for the explanation of the problem and the demonstration how to fix it!
Hi,
first of all: Great job in publishing your work. I appreciate the fabulous effort!
Subject of this issue: I wrote the line "or work on line level" myself and want the model to recognise the text. Unfortunately, it does not and I cannot understand why.
Details:
I currently consider to use the SimpleHTR model in some open source project. Therefore I attempted to benchmark it very naively. For that, I wrote a line that reads "or work on line level" to resemble the demo data. This is the image I used (right-click to download it yourself):
I followed the README to run the demo using the text line and achieve a satisfactory output,
When I run the script with the data provided above, I obtain
Could you please help me to identify my mistake? Aside from helping me, I believe this is a very naive test that the model should pass easily. Explaining it to me here certainly helps the community to get started even easier.
Cheers!