Hammerspoon / hammerspoon

Staggeringly powerful macOS desktop automation with Lua
http://www.hammerspoon.org
MIT License
11.82k stars 577 forks source link

Add hs.image:findInImage #397

Open cmsj opened 9 years ago

cmsj commented 9 years ago

If we can snapshot a window/screen, it would be very interesting to be able to do an image search within it. It's a nice way to be able to locate a bit of UI to click on, if it's not otherwise introspect-able.

asmagill commented 9 years ago

Probably should have put it in here... For completeness:

We still have hs.screen.shotAsJPG and hs.screen.shotAsPNG to convert to hs.image, and divesting the screencapture code from the write to file code and using hs.image.saveToFile might help here, especially since we can set the rectangle for the screen capture

cmsj commented 9 years ago

(I'd note that I currently have zero idea of how to search for an image in another image, particularly with some reasonable level of fuzziness)

cmsj commented 9 years ago

Looks like embedding OpenCV might be the way to go here, but that feels quite heavy

asmagill commented 9 years ago

I know that imagemagick has a compare command, but I don't know much about it... and for OCR, I've used Tesseract OCR a little (mostly to integrate it into ownCloud), but beyond that... yeah, I'm kind of open to suggestions/experiences from others on this as well. I'd like to automate some things in LittleSnitch, and I know they specifically don't allow AppleScript to manipulate the app, so image matching is an attractive alternative.

Since the request that started this was specifically about finding menubar icons, I was working a while back on a menu-walker as a way to get a list of defined command keys... I got distracted and it spewed a whole crap load of information about all kinds of AXUIElement objects and I never did get around to identifying everything or wrapping it up in a nice, clean set of functions... perhaps I can find the old code and we can see if this gets us at the more immediate question a different way?

On Jul 14, 2015, at 1:11 AM, Chris Jones notifications@github.com wrote:

Looks like embedding OpenCV might be the way to go here, but that feels quite heavy

— Reply to this email directly or view it on GitHub https://github.com/Hammerspoon/hammerspoon/issues/397#issuecomment-121140670.

cmsj commented 9 years ago

@asmagill I'd be very surprised if that code would tell you anything about the NSStatusItem objects in the menubar. AFAICT that part of AppKit isn't hooked up to accessibility at all.

asmagill commented 9 years ago

Fair enough -- missed the note that it was a NSStatusItem, just that it was in the menubar.

cmsj commented 9 years ago

So, I did a bit of investigation of OpenCV and a 64bit build of their OSX framework, with minimal third party library support, rolls in at 20MB. I think we have to rule that out, since that's >3x the size of our entire shipping bundle at the moment.

That means we either need to find an alternate library, or shelve this issue until someone fancies implementing an image recognition algorithm!

krasnovpro commented 7 years ago

Same function from AutoHotkey (windows analog of hammerspoon): https://autohotkey.com/boards/viewtopic.php?f=6&t=18719 https://autohotkey.com/docs/commands/ImageSearch.htm

cmsj commented 6 years ago

I'm going to close this because OpenCV is too heavy, and nobody seems likely to have the time to implement the relevant algorithms directly.

phu54321 commented 2 years ago

I really want this to be implemented. It greately saves how to macro our moves. Can this issue be opened for accepting PRs? (I'm willing to do make one)

Frankly I don't really want opencv-like big chunk of dependency. I'm okay with an exact match.

For example, I'm using Imagesearch functionally of autohotkey to search where to click. I don't really need to search for similar matches. I only want an exact match for images.

https://github.com/phu54321/DavinciWinKeyHover/blob/master/TimelineClick.ahk

latenitefilms commented 2 years ago

Something like this could be helpful?

https://github.com/ameingast/cocoaimagehashing

latenitefilms commented 2 years ago

Note to self:

https://github.com/MasterFocus/AutoHotkey/blob/master/Functions/Gdip_ImageSearch/Gdip_ImageSearch.c

cmsj commented 2 years ago

@phu54321 happy to re-open it, I'd also be happy to have an exact match image search. I think we should discuss the API it would have before anyone gets too deeply stuck into the work - I suppose the first question would be, should this live in hs.image and be able to look for exact sub-images for any image, or should it live in hs.screen and only be for looking for exact sections of the currently active displays? The former is more broadly useful, the latter is probably able to be more tightly optimised.

latenitefilms commented 2 years ago

CocoaImageHashing actually looks pretty good (I know you don't like additional pods @cmsj - but it's pretty clean and simple code).

My idea would be that we add a hs.image:compare(imageToCompare) method, then we could add a new method to hs.screen that uses this new compare functionality to find matches on the screen. I have no idea how performant this would be, but I'm basically thinking you take the width and height of the imageToCompare object, then break the screenshot into components that match the same dimensions, then iterate through all the components until you find a match? Is that crazy?

latenitefilms commented 2 years ago

FWIW - it looks like Keyboard Maestro uses OpenCV, and the binary is only 54.9MB, compared to Hammerspoon's 43.1MB.

cmsj commented 2 years ago

Yeah it is possible to build OpenCV with just the required modules available. It's cmake based though, which potentially makes our "clean build from git" goal a bit trickier to achieve.

asmagill commented 2 years ago

I won't have a chance to look at this until middle of the week, but would this be of any help? https://developer.apple.com/documentation/vision/analyzing_image_similarity_with_feature_print

latenitefilms commented 2 years ago

I think it's iOS/iPadOS/Catalyst only?

asmagill commented 2 years ago

Missed that...

I just remembered hearing about the Vision framework recently, but hadn't had a chance to dig into it very deep. Oh well, back to non-Apple solutions then!

cmsj commented 2 years ago

@latenitefilms I'm not a fan of :compare() as a name because that suggests to me that it's going to somehow report the similarity of the images, but IME what people typically want to do with these APIs is look for a specific small image within a larger one (ie a screenshot) so they can find some piece of UI that isn't discoverable through accessibility APIs.

Perhaps a good place to start would be something like hs.image:findWithin() or hs.image:containsImage()?

Edit: and yeah, I am not a fan of adding more pods, but I'm not going to enforce a blanket rule there - let's figure out the various options and try to find one that is well maintained.

phu54321 commented 2 years ago

For optimization purpose it'd be advisable to

Meanwhile I'm using python library for implementing my own tool. Source looks like this. (This uses openCV, but I'm utilizing it only for exact matches)


needleList = [
    Needle(cv2.imread('image/edit_page_tabs_inactive.png'), 50),
    Needle(cv2.imread('image/edit_page_tabs_active.png'), 50),
    Needle(cv2.imread('image/edit_page_options_inactive.png'), 50),
    Needle(cv2.imread('image/edit_page_options_active.png'), 50),
]

# %%

def grabScreenshot():
    with mss.mss() as sct:
        monitor = sct.monitors[1]
        img = np.array(sct.grab(monitor))[:,:,:3]
        return img

# %%

def findNeedleInTemplate(needleList, screenshot):
    for needle in needleList:
        result = cv2.matchTemplate(needle.image, screenshot, cv2.TM_SQDIFF_NORMED)
        min_val, _, min_loc, _ = cv2.minMaxLoc(result)

        # found match
        if min_val < 0.05:
            return needle, min_loc

    return None, None

# ......

    # quick: check previous search site
    needle, loc = None, None
    if prevLoc:
        nH, nW = prevNeedle.image.shape[:2]
        if (img[prevLoc[1]: prevLoc[1] + nH, prevLoc[0]: prevLoc[0] + nW] == prevNeedle.image).all():
            needle, loc = prevNeedle, prevLoc
        else:
            prevNeedle, prevLoc = None, None

    # Find appropriate handle
    if not needle:
        needle, loc = findNeedleInTemplate(needleList, img)

    if needle:
        # Click timeline
        origX, origY = pyautogui.position()
        newY = loc[1] // 2 + needle.mouseYOffset
        pyautogui.mouseDown(origX, newY)
        pyautogui.moveTo(origX, origY)
        # print('mouse down at (%d, %d)' % (origX, newY))
        prevLoc, prevNeedle = loc, needle
        isMousePressed = True
latenitefilms commented 2 years ago

FWIW - You can already crop an hs.image using hs.canvas. We use this technique (along with :colorAt()) to detect if Final Cut Pro X is playing or not:

https://github.com/CommandPost/CommandPost/blob/develop/src/extensions/cp/apple/finalcutpro/viewer/ControlBar.lua#L262

latenitefilms commented 2 years ago

@latenitefilms I'm not a fan of :compare() as a name because that suggests to me that it's going to somehow report the similarity of the images, but IME what people typically want to do with these APIs is look for a specific small image within a larger one (ie a screenshot) so they can find some piece of UI that isn't discoverable through accessibility APIs.

I was actually thinking that hs.image:compare(imageToCompare) would use something like:

[[OSImageHashing sharedInstance] compareImageData:firstImageData to:secondImageData]

...so it would actually be comparing NSImage's - then we'd do our own code in hs.screen to basically split the screenshot up into lots of sections, and then compare each section to see if we get a match.

Perhaps a good place to start would be something like hs.image:findWithin() or hs.image:containsImage()?

This would be awesome - I'm just not sure how you do it without something like OpenCV - there doesn't seem like there's many alternatives out there for macOS?

Edit: and yeah, I am not a fan of adding more pods, but I'm not going to enforce a blanket rule there - let's figure out the various options and try to find one that is well maintained.

CocoaImageHashing is archived - so maybe it's not even worth experimenting with, if you want something that's actively maintained? I'll have a play if I get a chance regardless.

OpenCV sounds like the best route - it's just a little above my pay grade in terms of how we'd get it into Hammerspoon nicely - so not sure I'd be much help there.

Note to self:

How to build machine independent Mac OS app with OpenCV - 2016, so this is fairly outdated.

https://github.com/XiomarG/VideoFrameProcessDemo - A sample app from the blog post above, again fairly outdated.

phu54321 commented 2 years ago

I've created a simple library for searching 2d image inside image. Released it in MIT license. I've never used objective C, so I've kept everything in C (no C++. Using only malloc in libc) for minimal dependency. Need someone else for integrating this to hammerspoon.

https://github.com/phu54321/findImageInImage (This only supports exact image match)

macOS supports various color profiles (P3, sRGB, Adobe RGB) for screen UI, so unlike in Windows, one should be careful about cropping the screenshot. If one crops the screenshot using a software-only supporting 8-bit sRGB color space (e.g: Paintbrush) RGB value of the image may change, so the exact match couldn't be made. This should be documented.

phu54321 commented 2 years ago

OK I'm forking this repo and willing to create a PR. Before preceding, I found that CONTRIBUTING.md cites that a preferred way of adding code is by creating new extension, but I think my code should sit inside hs.image namespace. Is extending existing extension allowed? Thanks.

latenitefilms commented 2 years ago

Yes, you can update/extend existing extensions. I’ve added a few methods to hs.image over the years.

FischLu commented 1 year ago

OK I'm forking this repo and willing to create a PR. Before preceding, I found that CONTRIBUTING.md cites that a preferred way of adding code is by creating new extension, but I think my code should sit inside hs.image namespace. Is extending existing extension allowed? Thanks.

cool, did you send PR? I am interested in this function as well, I would like to provide if you need.

phu54321 commented 1 year ago

Forgot it xd I dunno how to handle various color space management things. Maybe I'll try today