Roadmap: OCR (Optical Character Recognition)

asweigart / pyautogui

A cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard.

BSD 3-Clause "New" or "Revised" License

10.51k stars 1.27k forks source link

Roadmap: OCR (Optical Character Recognition) #136

Open asweigart opened 7 years ago

asweigart commented 7 years ago

It would be nice if there was some sort of OCR feature for PyAutoGUI. It may be to unwieldy to integrate this into PyAutoGUI, however it's worth investigating. This issue will track this feature. It should be noted that I personally don't have a timeline or resources committed to this in the near future.

alphaCTzo7G commented 6 years ago

Can't find the issue but somebody mentioned about tessaract https://github.com/tesseract-ocr/tesseract.

Perhaps instead of building the functionality, can we just have similarly to how opencv is used if we have it installed so we can use the confidence parameter. Because this is a bit library.. not everybody would want to install it by default.. but if it increases the complexity of the code perhaps its fine to use tessaract as a dependency as well..

bjanzen commented 6 years ago

I mentioned it, and if Al's going to do that, then we should just do the full Monty and call it the Sikuli replacement, because this was the last piece for me.

alphaCTzo7G commented 6 years ago

There's also lackey.. https://github.com/glitchassassin/lackey

Didn't know that SikuliX could recognize text in images.. There seems to be lackey which tries to create a python equivalent of SikuliX...

bjanzen commented 6 years ago

But sans OCR, correct?

click("Start_Button.png") and no click("Start") ? That becomes more critical for test script maintenance, especially when your UI/UX people think they're earning their paycheck by shifting a few pixels around every so often.

Ah, looks like this was looked at a while ago. https://github.com/glitchassassin/lackey/issues/24 I don't really care who does it. I have my own code to do it now, but I am concerned when updates on github projects are sparse, which is why I'm onboard with this one.

niltype22 commented 4 years ago

Recently worked with pytesseract and found it very user friendly. It can locate word coordinates on the screen.

First question: Which functionality do people want wrapped? These seem like good candidates but what else... pyautogui.word_position('word') for when they pretty much just need to click on text pyautogui.image_to_text() for when they want to just read a section of the screen to use pyautogui.image_to_data() for when they need all the data (pytesseract can provide a pandas dataframe which is convenient).

Second question: How to package it all together? When I just used it, I had to download tesseract using homebrew, and then pass pytesseract a link to its executable. I am wondering what the most error free way to do this is...

t3ch9 commented 4 years ago

@niltype22 Is there any test version of the added functionality you mentioned?

neisor commented 3 years ago

Hi @asweigart , Out of curiosity, is there anything new in regards to OCR implementation to pyautogui?

Thank you for all yor efforts!

flosincapite commented 3 years ago

Hi, all, I just wanted to see what the current status of this was. Will be happy to work on a PR if no one else has taken this up.

willwade commented 2 years ago

Check this out - its got a neat trick up its sleeve for OCR on windows. Also supports backends for tesseract and easy_ocr - but for windows its a really neat approach as there are no dependencies

https://github.com/wolfmanstout/screen-ocr

see in particular https://github.com/wolfmanstout/screen-ocr/blob/master/screen_ocr/_winrt.py

DanielOnGitHub17 commented 3 months ago

It would be nice if there was some sort of OCR feature for PyAutoGUI. It may be to unwieldy to integrate this into PyAutoGUI, however it's worth investigating. This issue will track this feature. It should be noted that I personally don't have a timeline or resources committed to this in the near future.

@asweigart, I used pytesseract for this feature. See here: https://github.com/DanielOnGitHub17/pyautogui-find-string. Users would have to install the Tesseract engine to use it.