VoxelCubes / PanelCleaner

An AI-powered tool to clean manga panels.
GNU General Public License v3.0
243 stars 18 forks source link

English text looks strange despite having Tesseract #109

Open Hamid1376 opened 3 months ago

Hamid1376 commented 3 months ago

First of all, thank you for creating this Amazing software, It saves me a lot of time for translating Manga. but for translating Comics, I did install Tesseract-OCR, but I can't see any option for switching to English Ocr and any comic I put it in there is comes out jibberish. sry if this is not a real issue, I'm just really not tech savvy.

VoxelCubes commented 3 months ago

You will additionally need to enable tesseract in your current profile, in the preprocessor settings. Don't forget to hit apply after making changes in the profile.

Tesseract is optional and isn't so good with ALL CAPS so it's off by default. There is the problem with relying on the text detector to figure out what language a bubble is. It can only detect japanese and english as languages, but can still recognize latin text, so it calls spanish english, usually. I will add a language override in the next release so you can tell it what language to use, ignoring the detected language. That way spanish and maybe chinese should become supported by tesseract. (I'll work on that in September)

Civvic is also experimenting with visual LLMs that are remarkably good at OCR (both local and api-based) which will open the door to much better OCR in the future.

Until then, you can also manually correct OCR with the review mode, which is on by default. That's new in the latest version.

Good luck, glad to hear it's been helpful.