OpenAdaptAI / OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
https://www.OpenAdapt.AI
MIT License
896 stars 117 forks source link

Support for Efficient Multi-Window Accessibility Tree Capture #847

Open ZeonLap opened 3 months ago

ZeonLap commented 3 months ago

Feature request

Thank you for your excellent work on OpenAdapt! Currently, OpenAdapt captures the accessibility tree (a11ytree) of only the foremost window. This approach works well for single-window applications. But when I try to capture multiple windows' a11ytrees (by modifying openadapt/window/_macos.py), the process becomes extremely slow (~10 seconds to acquire all windows' a11ytrees). I wonder if there are ways to capture a11ytrees more efficiently.

Motivation

We may need the accessibility trees of all windows in the current state as usable data, especially for tasks that involve interactions across multiple windows.

abrichr commented 3 months ago

Thank you @ZeonLap!

I wonder if there are ways to capture a11ytrees more efficiently.

Me too! If anyone knows of one, I hope they leave a comment. Unfortunately I believe this is unlikely however, since we are using native OS APIs for this.

We may need the accessibility trees of all windows in the current state as usable data, especially for tasks that involve interactions across multiple windows.

I'm not sure this is the case. If any given window is part of a workflow, then at some point that window will be the active window, and its state will be saved. What is the purpose of saving the state of background windows?