Closed hahmad2008 closed 3 years ago
EasyOCR uses this code to generate dataset and trains on it: https://github.com/Belval/TextRecognitionDataGenerator
From what I can guess, EasyOCR is more better towards scanned images because of the above. This is also similar to how Tesseract generates synthetic data.
Basically, we see text recognition under 2 classes:
So both EasyOCR & Tesseract fall under OCR I believe. To decide which one is better is upto your experiment. From what I've experimented, I can qualitatively say that EasyOCR's recognition models is somewhat better than Tesseract's recognition models (but not drastically).
Note that I am not taking about the detection part. EasyOCR library uses CRAFT model for detection which is DL-based, hence obviously better than current Tesseract's classical page segmentation-based text detection.
Tesseract fails on scenes due to it not know how to binarize the image, using DB/CRAFT + Tesseract works pretty well and is optimal for CPU when not understanding your incoming images
More generally, as a developer who just wants to OCR stuff, what makes this library different from Tesseract or other OCR solutions? Why should I use this library? Where does this excel at?
@ColonelThirtyTwo this repo is more for scene text or general text extraction - Tesseract on its own only really works well with well formatted and aligned documents OR small regions extracted from scene text that is not cursive/odd fonts
So basically, based on @ghandic and @GokulNC 's comments, Tesserract works well for scanned print documents, whereas EasyOCR works well for extracting texts in general scenes / random pictures. Is that right?
That's the kind of info that I would like to know when learning about projects, so that I can see if it is appropriate for me to use it or not. I recommend putting that on your website.
EasyOCR is not only for scanned images, isn't it? because I know Tesseract needs pre-processing for images that are not scanned to make them look like scanned images to have a good performance.