Closed ghost closed 1 year ago
Could you please attach an example image?
Normally I don't use English ocr a lot, but I realized this while trying the best data I just downloaded, you may have problems because you can't catch the angle in oblique texts, for example, here I tried to make the word bikinis to try ocr, it didn't work, I tried to take the sentence completely, it didn't work.
The main problem with this image is not the rotation, but the font. Even if you fix the rotation manually, Tesseract will still not recognize the text, since the official English data was trained on more traditionally looking fonts such as Arial and Times. You will need to find third-party language data that was trained on fonts like the one in your image (or even do the training yourself, if you like).
@danpla yes I know, I threw this picture in a hurry, unfortunately I couldn't find a proper example, rotation change is still really important especially for those who want to ocr word by word, because you may have to ocr the whole sentence or paragraph because of the rotation problem. I know it can be hard to implement, but I believe it saves time in the long run.
Due to the nature of the program, it cannot detect oblique text or words, even if it detects it, it detects it incorrectly, an angled area selection can be beautiful.