damishshah / comic-book-reader

Python application to identify speech bubbles and read text from comic book pages.
MIT License
44 stars 11 forks source link

Comic Book Page Reader

Summary

Python application to identify speech bubbles and read text from comic book pages. This project is mostly being used as a way to collect comic book text data to teach a separate machine learning algorithm to write comic book-esque speech.

Technologies

Python, Flask, Gunicorn, JavaScript, HTML/CSS, Docker, Docker Compose, Nginx

Libraries

Pytesseract for OCR (Optical Character Recognition), OpenCV

Developer Notes

This was an excellent project to deep dive into the above technologies for computer vision with an image subject that I enjoy greatly (comic books).

There are some conveniences when it comes to OCR for comic book pages:

There are also quite a few challenges:

Trade-offs and other considerations:

References

Dubray, David & Laubrock, Jochen. (2019). Deep CNN-based Speech Balloon Detection and Segmentation for Comic Books. https://arxiv.org/abs/1902.08137.

I wanted to give a shoutout to the research team from Cornell for their research in this area. I reached out to them when I was considering a Nueral Net approach to this problem and they helped answer some questions that I had. You can read their excellent research paper at the above link and check out their model here: https://github.com/DRDRD18/balloons