asweigart / pyautogui

A cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard.
BSD 3-Clause "New" or "Revised" License
10.36k stars 1.26k forks source link

Screenshot functions and "find text on screen in a dialog box" #521

Open josephernest opened 3 years ago

josephernest commented 3 years ago

I have read Screenshot functions, and I know we can locate a button on screen, from a pre-made image of this button, such as this (example: calc.exe):

Question: How would it be possible to locate a UI element on screen via text and not via an image?

This is a generalization of https://github.com/asweigart/pyautogui/issues/520.

I see three approaches:

Did someone get any success @asweigart with one of these methods, or another?

josephernest commented 3 years ago

First attempt for approach number 3, working :) :

https://user-images.githubusercontent.com/6168083/103917063-7425ef80-510d-11eb-96ec-5118da7d9858.mp4

Open notepad.exe, go in "About Notepad", and run this:

import pyautogui, time
from PIL import Image, ImageFont, ImageDraw

def findTextOnScreen(text, fontsize=12):
    font = ImageFont.truetype("segoeui.ttf", fontsize)
    w, h = ImageDraw.Draw(Image.new("RGB", (1, 1), (255, 255, 255))).textsize(text, font=font)
    img = Image.new("RGB", (w, h), (255, 255, 255))
    ImageDraw.Draw(img).text((0, 0), text, (0,0,0), font=font)
    img.save('test.png')
    yield from pyautogui.locateAllOnScreen('test.png', grayscale=True, confidence=0.75)

for box in findTextOnScreen('About Notepad'):
    print(box)    
    pyautogui.moveTo(box.left, box.top)
    time.sleep(1)

It will move the mouse to "About notepad" title bar.

Problem: even if using grayscale=True, it's still very sensitive to color background. For example it doesn't work for buttons with grey background. Any idea @asweigart?

josephernest commented 3 years ago

Are you interested for a pull request for something like this:

def clickTextOnScreen(text, fontname="segoeui.ttf", fontsize=12):
    font = ImageFont.truetype(fontname, fontsize)
    w, h = ImageDraw.Draw(Image.new("RGB", (1, 1), (255, 255, 255))).textsize(text, font=font)
    img = Image.new("RGB", (w, h), (255, 255, 255))
    ImageDraw.Draw(img).text((0, 0), text, (0,0,0), font=font)
    pos = pyautogui.locateCenterOnScreen(img, grayscale=True, confidence=0.75)
    pyautogui.click(pos)

clickTextOnScreen('About Notepad')

It seems to work in many cases (except when the background color is not white).

Avnsx commented 2 years ago

@josephernest thanks man you just saved me so much time 🙏

Also about it's still very sensitive to color background; out of experience I know that pyautogui can also search for images that contain transparent parts in them, the transparency actually remains intact. This means if you find a way to create the same .png with a transparent background you could directly find the text you are searching for, without the issue of background colors. You will most likely have to increase the confidence drastically though, since it's a transparent image to search for. Additionally I advice against the usage of grayscale.

Edit: After a couple hours of experimenting, this is a great idea but not reliable in finding text sometimes confidence just goes spastic

DanielOnGitHub17 commented 2 months ago

I have read Screenshot functions, and I know we can locate a button on screen, from a pre-made image of this button, such as this (example: calc.exe):

Question: How would it be possible to locate a UI element on screen via text and not via an image?

This is a generalization of #520.

I see three approaches:

  • Do OCR on pyautogui.screenshot(), and find the text
  • Use WinAPI to get the actual text content of a dialog box, and find the position of each text element
  • Render the text with the default Windows GUI font (Segoe UI?) to a search.png file, and then use pyautogui.locateOnScreen('search.png') ?

Did someone get any success @asweigart with one of these methods, or another?

I recently thought of approach 1 and 3 and decided to go for approach 1. Well done on approach 3 @josephernest!!! @asweigart, here is a repo for approach 1: https://github.com/DanielOnGitHub17/pyautogui-find-string