danpla / dpscreenocr

Program to recognize text on screen
https://danpla.github.io/dpscreenocr/
zlib License
231 stars 17 forks source link

Comparison with other OCR tools #28

Open cvladan opened 1 year ago

cvladan commented 1 year ago

Can you briefly explain the difference between the dpScreenOCR tool, Microsoft's Text Extractor utility from PowerToys, and the OCR tool from the open-source ShareX package?

Actually, it would be ideal to write inside README.md file a brief comparison between these three tools, or a comparison table with those tools features.

Thanks

danpla commented 1 year ago

Unfortunately, I have not used these programs, so I can't say anything about the differences.

I understand that a feature comparison table would be useful. But there are other popular OCR tools besides PowerToys and ShareX. Since they all evolve over time, I will have to constantly keep track of their features to keep the table up to date. Unfortunately, I don't have time for that.

cvladan commented 1 year ago

I found the time to try out all three tools and compare them. I'm not going to comment on the user interface, which is actually quite nice in dpScreenOCR.

I've only been comparing the quality of the OCR recognition and I decided to test PowerToys and ShareX against the dpScreenOCR tool on screenshots of some Serbian government site, where the font quality was good and with perfect image sharpness. The source text is written in Cyrillic. Surprisingly, both PowerToys and ShareX did very poorly!

It's now clear to me why they underperformed - they are powered by the same engine as ShareX locally uses the Microsoft OCR engine, as confirmed by this comment by ShareX author.

In contrast, dpScreenOCR, as well as all the other Tesseract tools, performed excellently on the same example.