Closed PBahner closed 2 years ago
Are you able to specify the font family. It could be Arial. This could also dramatically improve the accuracy.
@PBahner Pytesseract ist designed to detect whole sentences. Not single numbers and symbols.
https://stackoverflow.com/questions/68512226/how-to-improve-the-accuracy-of-pytesseract
Have you tried what the first answer on the linked thread is saying?
Ok I have tried it now and parsing every single char works worse than I thought. Especially then the image recognition confuses letters and numbers (e.g. B->8, I->1, O->0, g->9, ->l->1)
so easyocr is also not the answer? godamnit...
I'm actually trying to improve the recognition. Surely it will work better than PyTesseract but it won't work flawlessly...
I just tested a bit with tesseract and notices that you can (at least in rust) specify the scan region. If you would scan every cell independent that would improve the accuracy. (because there are no lines in between)
@MarcelCoding I always parse the cells independently...
You extract and specify the bounds of every cell?
Here are some examples what doesn't work:
D
,L}
ornn
instead of empty table cellI
instead of1.
T.
instead of7.
BB
instead ofB8
ISY
instead ofIGY