Fine-tuning Tesseract OCR engine to recognize certain characters

Google's Tesseract OCR engine works quite well for most languages. However, it does not recognize "«" and "»" characters, which are used extensively in Azerbaijani texts. It is possible to fine-tune the model for special characters. In fact, Google provides a detailed tutorial for this. This is an open problem, and we would love to see a solution. We are also open to a collaboration, although we cannot commit to it full-time.

allmalab / problems

Fine-tuning Tesseract OCR engine to recognize certain characters #4