Open zamazan4ik opened 1 year ago
Hi! Thanks for sharing the solution! The thing is the application needs as universal pre-processing as possible for different types of images (cos I don't want delegate manual adjustment to end-users). So, yeah, now the implementation has quite weak pre-processing for Tesseract and I defenitely have plans to consider improvements in that way.
Hi. Thanks for the project!
A few years ago I developed an in-house ad-hoc OCR solution based on Tesseract. And want to share some suggestions. As you probably already know, Tesseract has very weak preprocessing phase (not-so-advanced binarization techniques, trash removal, etc.). To help with it, I developed a library with pre-recognition algorithms: https://github.com/zamazan4ik/PRLib Maybe you would be able to find something useful there.