To enhance dwell clicking, showing halos, centering the dwell click indicator, and use a UI element's bounding box as the region to dwell within to click it, we need information about UI elements under the cursor. For web pages, this may be done through a browser extension (#27), but for native apps we'll need to use system-specific accessibility APIs — or a cross platform one, but that will probably bring too much cruft, and cause friction.
A good strategy might be to use a high level API for a proof of concept and then, to reduce unnecessary dependencies (to make it easier to install), try copying only the parts of code that are needed.
But it may be easier to just use the native system APIs directly from the start. Have to vibe it out.
Windows
Windows looks like it has a good API for this. I think this covers everything I need:
"If you have no prior knowledge of the applications that your client may be used with, you can construct a subtree of all elements of interest by using IUIAutomationTreeWalker"
"To retrieve an IUIAutomationElement from screen coordinates, for example, a cursor position, use the IUIAutomation::ElementFromPoint method."
Detect buttons by recording the screen and looking for hover states???? that obviously can't work for everything. It would be an interesting hack, but obviously I should try using the proper accessibility APIs first.
To enhance dwell clicking, showing halos, centering the dwell click indicator, and use a UI element's bounding box as the region to dwell within to click it, we need information about UI elements under the cursor. For web pages, this may be done through a browser extension (#27), but for native apps we'll need to use system-specific accessibility APIs — or a cross platform one, but that will probably bring too much cruft, and cause friction.
A good strategy might be to use a high level API for a proof of concept and then, to reduce unnecessary dependencies (to make it easier to install), try copying only the parts of code that are needed. But it may be easier to just use the native system APIs directly from the start. Have to vibe it out.
Windows
Windows looks like it has a good API for this. I think this covers everything I need:
There is also Cobra WinLDTP, which shares a cross-platform API.
macOS
local position = currentElement:attributeValue("AXPosition")
There is also ATOMac - Automated Testing on Mac, the macOS version of LDTP.
Linux
I could use the LDTP API, or the underlying AT-SPI API.
In particular,
getobjectnameatcoords
which takes some code from Accerciser.General