C-Loftus / sight-free-talon

Integrate Talon voice dictation commands with TTS, screen readers, braille, and more!
http://colton.place/sight-free-talon/
GNU General Public License v3.0
16 stars 3 forks source link

Give Talon access to screen reader's accessibility tree #21

Closed zersiax closed 9 months ago

zersiax commented 10 months ago

I do not know to what degree this is possible. Let me preface my idea with that :) I think, for me at least, the largest limitation Talon has is its reliance on Cursorless, mouse grids and essentially any variation of the concept of tagging UI elements with numbers, two-letter abbreviations etc. Due to the sequential nature of screen readers, only one element at a time can be in focus, which means the rest of the screen is unperceivable while that happens. This means that without focusing the element in question first to find its identifier, this sorta kinda works with the Cursorless browser extension for example, screen reader users aren't going to know what to tell Talon in order to interact with that element unless they focus it through some other means, at which point they may as well just continue using the modality they were using, if able. Mind you, this info is based on Talon as it was several months ago, i don't know if this has improved. Dragon is, generally, able to receive a command like "Click OK" and find and interact with the right button. I believe there is a Talon OCR plugin that does something similar but in my tests, it wasn't always very reliable. My thought was that NVDA has a rendition of the accessibility tree at any given point; that's what's used for object navigation, for eample. It also is able to use OCR using the windows 10/11 aPIs that far outstrip open-source offerings like Teseract at present. A mode might be developed that uses this information to provide Talon with this info and work with it in order to allow the user to access controls in view similar to Dragon or possibly even better?

C-Loftus commented 10 months ago

Thanks for writing those thoughts up! Yes this is definitely doable but will require some work. I agree with you about the reliance upon visual stimuli within the Talon ecosystem. My goal is to decouple that.

Regardless I am currently in the process of testing an addon for interprocess communication between Talon and NVDA. The only technical challenge I see with this is keeping the two synchronized correctly. Regardless this is more of a challenge of time and resources for me more than technical issues at the moment. If I am able to find more time and support I have a lot of ideas.

I originally tried to keep as much as possible talon side since it makes things more screen reader agnostic and easier to maintain, but now that the project is a bit more mature I think I'm at the point where I will need to incorporate more addon functionality. (but just to be totally clear I still intend to have this program work fine with or without a screen reader and with or without an addon for it)

I am planning on some features like blocking text input if you aren't in an editable text field , a voice command to extract text from one of the NVDA cursors, a voice command to send the image at the NVDA cursor to the OpenAI api, having a command to click via the button name on the screen as you say etc

If you are a screen reader user and want to be involved testing, or just want to discuss anything related to this feel free to reach out to me on my website. https://colton.bio/contact/ I'd love to get some of your thoughts in detail on some other ideas.

Cheers

zersiax commented 10 months ago

I did mean Rango, sorry about that. They sorta do similar things and blurred together in my head. Also happy to test. Will contact you through website :)

C-Loftus commented 10 months ago

All good just wanted to make sure we're on the same page! Thanks, look forward to chatting.

C-Loftus commented 9 months ago

@zersiax Quick question on this since I don't use Dragon, how does this functionality work with languages other than English? For instance, if your computer is set to a language other than English but the dictation engine only recognizes English, it seems like you wouldn't be able to dictate the name of the accessibility element properly.

As an update, I have almost completed the interprocess communication with NVDA. Now, I am researching ways to serialize the accessibility tree. Please let me know if you happen to know if there is an NVDA add-on API function to retrieve the entire accessibility tree. So far, I have not been able to find it. No worries if you don't know, I will continue to search regardless.

C-Loftus commented 9 months ago

implemented in ead815e90e78005f2901e65cfe864b585861c3b0 . Note this may need more testing, some applications may not entirely work due to upstream errors originating from underlying Talon rust crates outside my control. Works with Talon beta only since it is dependent on dynamic list functionality.