apirrone / Memento

Memento is a Python app that records everything you do on your computer and lets you go back in time, search, and chat with a LLM (Large Language Model) to find back information about what you did.
MIT License
538 stars 43 forks source link

Accessibility API #49

Open jlia0 opened 1 year ago

jlia0 commented 1 year ago

Is this connected to the accessibility API to retrieve context information (like url, app name, etc) yet?

apirrone commented 1 year ago

Hi, I tinkered a bit with the accessibility API to try to extract the text directly from the apps instead of using an OCR, but did not achieve much. If you have any good reference material it would be great, or you can make a PR :)

jlia0 commented 1 year ago

Here's one that cyte2 was referencing from: https://github.com/tmandry/AXSwift

However I am not sure if there is a Python API for it, do you mind sharing some of your tinkering code?

I believe we still need OCR for extracting the text, the accessibility api is for extracting "metadata" like url or window contexts.

apirrone commented 1 year ago

Unfortunately I don't seem to have kept my tinkering code :/ I tried to use ORCA screen reader (https://github.com/GNOME/orca) but it was not the right tool I think

jlia0 commented 12 months ago

https://kevinchen.co/blog/rewind-ai-app-teardown/

^^^ I think this would probably help

apirrone commented 11 months ago

Yes this blog post was very helpful :)