TheJoeFin / Text-Grab

Use OCR in Windows quickly and easily with Text Grab. With optional background process and notifications.
https://www.microsoft.com/en-us/p/text-grab/9mznkqj7sl0b?cid=TextGrabGitHub
MIT License
3.25k stars 220 forks source link

Better OCR with option to train model on new fonts #322

Open CharlesARoy opened 1 year ago

CharlesARoy commented 1 year ago

Describe the pain point and your solution At a high level, I am looking for a tool that can quickly capture a selected part of the screen, accurately detect and convert any text along with associated formatting, and automatically send that text to the clipboard. In my tests, Text-Grab is quick, but not very accurate.

I'm also often wanting to capture code in which indentation is important, but Text-Grab strips out any leading whitespace.

To improve the OCR, it would be great if you could train the model on specific fonts and/or images. With recent advances in AI, maybe there's an API out there for a tool that does OCR much better?

Mode which would include change

Describe alternatives you've tried or considered So far, I've tested the following OCR tools:

ABBYY FineReader PDF (OCR editor): ABBYY Screenshot Reader: Capture2Text: Copyfish: dpScreenOCR: Easy Screen OCR: Greenshot: NormCap: PDNob Screenshot to Text Converter: ShareX: Snagit: Snipping OCR: TextSniper:

Of these, the tool with the best OCR is the OCR Editor that's part of the ABBYY FineReader PDF suite. You can tell it which fonts to use and even train their OCR model on new data:

image image

However, for my purposes, it's not effective because it's not efficient for quickly sending the converted text to the clipboard. The best overall tool in terms of accuracy, speed, and UI is TextSniper but it's only available on Mac. Text-Grab is fast and has the best UI I've seen on Windows, but in my tests, the OCR accuracy is in the bottom half of the above tools.

Screenshots or sketches Here's one use case (among others) in which I might want to use a Screenshot OCR tool. Say I'm working on a remote server and want to copy just the folder names (blue) to my local Windows clipboard. I can drag and drop on the screen to copy the text, but then I'd also get the entire middle line which I don't want. On the remote server, I could use Tmux in copy mode to copy just that text, but in that case it would only be available in the remote clipboard.

image

When I select that section with the standard OCR algorithm (Windows), I get the following, which is totally unusable: resu riantCa I le r_e. 01_10 resu riantCa Ile r_e. ing result_ARCl8ØE

When I select that section with the Tesseract OCR algorithm, I get the slightly better: result_ARC147_ARC147G—Variant@.0@5 result_ARC18@E_ARC160QG—VariantCaLler@.01_10 result_ARC18@E_ARC160QG—VariantCaller@.01_10 debugging

but even this is pretty unusable.

TheJoeFin commented 1 year ago

Try getting the best tesseract model instead of the fast one and see if that helps. Download this file: https://github.com/tesseract-ocr/tessdata_best/blob/main/eng.traineddata

and paste it where Tesseract is installed for you. Maybe somewhere like this C:\Program Files\Tesseract-OCR\tessdata

using that model this was the output on the terminal text above:

drwxr-xr-x 2 bioinfo domainusers 4.0K Jul 10 16:01 result ARC147 ARC147 G-Variant 0.005
drwxr-xr-x 2 bioinfo domainusers 4.0K Jul 11 03:19 result_ARCl8oE_ARCl6oQ_G-VariantCaller_o.ol_lo
drwxr-xr-x 2 bioinfo domainusers 4.0K Jul 14 10:20 result ARC180E_ARC160Q_G-VariantCaller_0.01_10_debugging

Not perfect, but pretty good if you ask me.

Let me know if this helps.

CharlesARoy commented 1 year ago

Thanks for the help @TheJoeFin!

With that model, the OCR results are much more accurate, moving Text Grab to the top ~4 programs that I tried in the above list.

That said, the ABBYY Screenshot Reader is still a bit more accurate so I'll probably stick with it for now. The ABBYY OCR Editor is even better so if I can figure out how to use it more efficiently, that'll be perfect.

If you can find a way to use an OCR model that can be trained on specific fonts and updated when it makes mistakes, that would be amazing and I'd love to hear about it. Also, if you find a way to retain formatting information such as leading whitespace (so that my indented code isn't screwed up), that would be gold.

Thanks again!