Open levavft opened 1 year ago
I've been thinking about trying to add additional backend options too - like adding an option to translate with DeepL instead of ChatGPT. If you're not sure how to contribute, but you have a stable fork of manga-ocr that uses pytesseract, can you upload it to a public repo? I'd be happy to take a look and provide some suggestions/support.
Alright, I'll make my version a bit more stable and clean, and upload it ^^ should take a few days at most. I think it might be a good idea to also add a google cloud ocr option for those who have a google cloud key, I'll see what I can do ;p
@levavft Just wanted to check in on this - Have you made some headway in your pytesseract fork?
Thank you very much for your enthusiasm in contributing to the project :)
Unfortunately, I have had very little time to work on this side project. But I can tell you that in the next version, I will add:
@rDarge The improvement to incorporate DeepL is almost ready. If you haven't started development yet, don't waste time on that.
@levavft A while ago, I had another project similar to this one (which I closed) that used pytesseract. I ended up abandoning it because creating an installable version with a moderately small size was impossible, very difficult to achieve. It would be very interesting and a great contribution if you manage to generate a Python installable that has pytesseract as a dependency.
@K-RT-Dev Great! I'll create some additional issues for the other changes I've been working on so I can make sure we've got alignment before I put up a PR
Hey @K-RT-Dev and @rDarge sorry for the delayed response. I've been testing what I have against less clean text, and its awful. (My personal use case is pretty clean text). Specifically - Korean manhwa text often has bubbly letters, which pytesseract simply can't read. So to be honest I'm feeling like spending more time on tesseract might be a waste of time. Instead, it might be good to use google ocr, especially since you essentially get to use it for free if you're not planning on making money from it. I have no idea if it does better on such text (I haven't tested it at all) but at the very least it should be much easier to use and install.
The things you're currently working on sound great! I can't wait to see them in action. I'll list some ideas that I had while playing with my tesseract version, and if something catches your eye I might spend some time on it (though, like you I seem to have somewhat reduced capacity for side projects ><)
Adding an option to view a page and create a list of bounding boxes to it, similar to this: https://github.com/manisandro/gImageReader
assuming you like the previous idea, you can use chatgpt on the text as a whole. which should allow it to be much more natural (especially if you specifically ask chatgpt to make it sound natural)
using a spell checker to rate the quality of different ocr results (from different engines / with different pre-processing steps) and choosing the best one.
Anyhow, keep us updates and I'll update if I'll have something worth sharing ^^
@levavft Could you share the set of images you're using to test text extraction? I have some models I could try to see their performance.
Adding an option to view a page and create a list of bounding boxes to it, similar to this: https://github.com/manisandro/gImageReader
We are aligned in our ideas. Precisely, the second mode of operation I am planning to integrate into the system consists of this. My idea is as follows:
I have conducted manual tests using this method, and the results are incredible. When GPT has the complete text from one or more narrators, it can infer dialogue exchanges much better. Additionally, if it has context describing what is happening (for example, "People are talking while they see a landscape"), it helps in identifying pronouns and verb tenses more accurately.
@Kromtar Sure! Here are the ones I had the most trouble with: https://github.com/levavft/manhwa-ocr-test-files
Good to see people are on the same page, this could become a very nice tool for translators or just those who want to read manhwa ^^
Hi~ I was about to embark on creating something similar for manhwa's and then I found this very nice project. backend-wise it should be really easy to extend this, for example using pytesseract. I would love to create anything you need for the backend, I've created a local version of manga-ocr that uses pytesseract and its simple enough, I just don't really know how to embed it into your project as it is a bit more involved.
using pytesseract it should be possible to extend to other languages as well, easily. of course, its not as good as manga-ocr's ocr (I couldn't found good databases that I could use to copy their approach) but setting pytesseract as a default when there isn't anything better should be great :)
so, basically, tell me how to contribute and I'll have a pull request ready in no time ;p