Closed alejandro-colomar closed 4 years ago
Yes, that should work. Please put in same location/folder as the eng.traineddata.
Thank you very much!! It worked!!
It has some trouble with the punctuation, but it's not very important for me. I'm reading prices, and I can assume that there are going to be always two decimal positions.
However I can give you the images I'm reading if they help your data be more accurate in the future :)
I don't know the font of my data.
I do not use this traineddata. It is only as a sample for trying out.
You can provide a couple of images. I can use them for testing, or try to find a font similar to that to include in a future run.
From this images, a lot of dots were missed (more or less half of them).
As expected, the €
symbol was not found in any of them.
Also two numbers were wrong:
In file 2.26e.png
it read "2.16"
.
In file 3.76e.png
it read "3.16"
.
I have the original color images in .BMP
format if you prefer them.
Just removing the €
from the images, and dilating-eroding 1 pixel after that, gives a 100% accuracy, including the dots :)
Could you please explain a little how should I use this data from the C API of Tesseract?
Until now I was doing this for the english default data (eng.traineddata):
TessBaseAPIInit2(handle_ocr, NULL, "eng", OEM_LSTM_ONLY);
Should I assume that I should do the following (for digits.traineddata)?:
TessBaseAPIInit2(handle_ocr, NULL, "digits", OEM_LSTM_ONLY);
Or should I do something else?