accessibleapps / accessible_output2

Output speech and braille using a variety of screen-reading solutions
MIT License
20 stars 18 forks source link

Fixing VoiceOver Support #12

Open tbreitenfeldt opened 2 years ago

tbreitenfeldt commented 2 years ago

Hi, the existing support for VoiceOver is broken. I have done a lot of research, and created a fork with my findings on this subject. I would like to open a discussion on my findings, and possible better solutions before I create a pull request.

The existing solution used right now for VoiceOver support uses a library called appscript. I was never able to actually get this to work myself, however a quick Google search on this library shows that appscript is actually deprecated

http://appscript.sourceforge.net/

I opened a thread in the audiogames.net developers room a while back to discuss the issues that I was having with the current implementation of the VoiceOver class, and my findings back then

https://forum.audiogames.net/topic/37573/voiceover-error-using-accessibleoutput2/

I pasted the existing implementation of the VoiceOver class for reference below

from __future__ import absolute_import

from .base import Output

class VoiceOver(Output):
    """Speech output supporting the Apple VoiceOver screen reader."""

    name = "VoiceOver"

    def __init__(self, *args, **kwargs):
        import appscript
        self.app = appscript.app("voiceover")

    def speak(self, text, interrupt=False):
        self.app.output(text)

    def silence(self):
        self.app.output(u"")

    def is_active(self):
        return self.app.isrunning()

output_class = VoiceOver

I started doing research into alternative methods for making VoiceOver speak on Mac, and actually found this rather helpful article

https://wiki.lazarus.freepascal.org/macOS_Text-To-Speech

To summarize this article, there are a total of 5 ways to get speech output on Mac that I am aware of:

So, as mentiond above, I ended up settling on Apple Script output for primary output assuming VoiceOver is running, and NSSpeechSynthesizer if VoiceOver is not running. There is one caveat to the Apple Script solution though. VoiceOver requires the user to enable to be controled by Apple Script. In the old AO2 code, this was handled like so:

First check if VoiceOver is running: (Very slow probably due to having to get all the processes)
tell application "system events" (name of processes) contains "VoiceOver"

Then check if VoiceOver can be controled by Apple Script by throwing a command at VoiceOver and returning false if the command fails (no idea where this command came from, probably lives in the elusive formal documentation on controling VoiceOver with Apple Script)

tell application "voiceover"
    try
        return bounds of vo cursor
    on error
        return false
    end try
end tell

Note: I rewrote these commands to make them more readable, since they were nested inside Python strings, so sorry about any syntax errors.

The AO2 code I am refering to is here (not sure if this is apart of the existing commit history for this repo) https://raw.githubusercontent.com/frastlin/accessible_output2/master/accessible_output2/outputs/voiceover.py

The issue with this solution is the incredible lag that is seen most clearly when using the Auto class. To run Apple Script through Python, you have to run it through the commandline tool osascript. This means that we have to use subprocess.Popen, or os.system to run these scripts. These functions will fork a new process to run the script, which means that with the existing implementation we are forking 3 separate processes everytime speak is called since is_active is called each time speak is called. WE first check if VoiceOver is running, then check if Apple Script control is enabled in VoiceOver, and finally a process to speak your output. This is incredibly expensive processing, and it was clear when I timed just the is_active method alone which was taking approximately 0.3 seconds, not to mention the speak function, which was taking a similar amount of time to execute. This meant that when Auto.output was called, an very noticeable lag could be observed by the user from when you should be getting speech output, and when you actually got speech output. I think that the lag is not all on process forking, I was seeing posts that were hinting at Apple Script not being particularly fast. Regardless, a different solution is necessary if we still want to talk to VoiceOver directly. Determining if VoiceOver is running is not terribly dificult to do outside of Apple Script. Python provides the psutils module to manage processes, however to check if VoiceOver Apple Script control is enabled had to be dropped. I could not find any other way other than throwing an script at VoiceOver and seeing if it failed to check this, and that is just to expenseive to be doing in is_active.

This is my solution for talking to VoiceOver directly using Apple Script. I cleaned up the code quite a bit, and also used the NSSpeechSynthesizer object to provide a bonus method for checking if VoiceOver is speaking. Unfortunately this was the only functionality out of NSSpeechSynthesizer that I found to interact with the running instance of VoiceOver.

import subprocess, psutil

from accessible_output2.outputs.base import Output

class VoiceOver(Output):
    """Speech output supporting the Apple VoiceOver screen reader."""

    name = "VoiceOver"

    def __init__(self, *args, **kwargs):
        from AppKit import NSSpeechSynthesizer
        self.NSSpeechSynthesizer = NSSpeechSynthesizer

    def is_speaking(self):
        return self.NSSpeechSynthesizer.isAnyApplicationSpeaking()

    def run_apple_script(self, command, process = "voiceover"):
        return subprocess.Popen(["osascript", "-e",
            f"tell application \"{process}\"\n{command}\nend tell"],
            stdout = subprocess.PIPE).communicate()[0]

    def speak(self, text, interrupt=False):
        # apple script output command seems to interrupt by default
        # if an empty string is provided itseems to force voiceover to not interrupt
        if not interrupt:
                self.silence()
        self.run_apple_script(f"output \"{text}\"")

    def silence (self):
        self.run_apple_script("output \"\"")

    def is_active(self):
        for process in psutil.process_iter():
            if process.name().lower() == "voiceover":
                return True

        return False

output_class = VoiceOver

This class is not perfect by any means, but it gets the job done. The rather confusing interrupt flag in the speech method was something that I was wrestling with, but as mentioned in the comment, it seems that the default behavior is to interrupt, and oddly if you follow a spoken message by an empty string it actually forces polite behavior where VoiceOver was finishing the first message in full before starting to speak the second message. That being said, not sure how useful the silent method is here outside of the context of the speak method. This was a happy axident, since the default behavior is to interrupt, and there is no clear way to disable that feature. Also, the NSSpeechSynthesizer import looks a little odd, although I was just following the convention of the rest of AO2. This is just the only place that I found that it was necessary to access a static method, so I did not have an instance to work with. If anyone thinks this should change to perhaps putting the import at the top of the file, I am happy to change it. I know there is concern about importing modules that are only found on certain platforms, such as AppKit, which would throw an import error on Windows. I don't think this is a concern since in the outputs.init..py file conditionally imports the screen reader classes for each platform, so the VoiceOver class should never get imported if you are on Windows. If I am wrong in my assumption here please correct me.

The second class I wrote was using the AppKit.NSSpeechSynthesizer object. Conveniently, there were already Python bindings for AppKit I believe provided by Apple. I have installed so many things in my global Python installation on my Mac, I am not 100% certain, but I am pretty sure based on my research AppKit is a default library in modern Mac OS. If someone can confirm this that would be great. I just implemented this class the same way SAPI5 was implemented, providing access to is_speaking, speak, silent, and voice configurations. The only thing I could not get to work was the Pitch. It looks like there is a method to set the pitch, but not to get the pitch. I just left pitch out altogether as a result. I saw how you are supposed to get the pitch through some generic object property method, but I could not get it to work in Python. Any help on this would be appreciated, otherwise I don't think people are going to care much if they cannot change the pitch. The voices dict does not hold an instnce of the voice object, rather it holds a reference to the voice identifier. I was having some trouble since the setVoice method takes the identifier rather than the voice object. It can be done I think with the voice objects if anyone has a strong argument for using objects rather than identifiers in the dict. The last thing to note here is the name. I called the file system_voiceover.py and the class SystemVoiceOver. If anyone has a better idea for a name, I am happy to change it.

from __future__ import absolute_import
import platform
from collections import OrderedDict 

from .base import Output, OutputError 

class SystemVoiceOver(Output):
    """Default speech output supporting the Apple VoiceOver screen reader."""

    name = "VoiceOver"
    priority = 101
    system_output = True

    def __init__(self, *args, **kwargs):
        from AppKit import NSSpeechSynthesizer
        self.NSSpeechSynthesizer = NSSpeechSynthesizer
        self.voiceover = NSSpeechSynthesizer.alloc().init()
        self.voices = self._available_voices()

    def _available_voices(self):
        voices = OrderedDict()

        for voice in self.NSSpeechSynthesizer.availableVoices():
            voice_attr = self.NSSpeechSynthesizer.attributesForVoice_(voice)
            voice_name = voice_attr["VoiceName"]
            voice_identifier = voice_attr["VoiceIdentifier"]
            voices[voice_name] = voice_identifier

        return voices

    def list_voices(self):
        return list(self.voices.keys())

    def get_voice(self):
        voice_attr = self.NSSpeechSynthesizer.attributesForVoice_(self.voiceover.voice())
        return voice_attr["VoiceName"]

    def set_voice(self, voice_name):
        voice_identifier = self.voices[voice_name]
        self.voiceover.setVoice_(voice_identifier)

    def get_rate(self):
        return self.voiceover.rate()

    def set_rate(self, rate):
        self.voiceover.setRate_(rate)

    def get_volume(self):
        return self.voiceover.volume()

    def set_volume(self, volume):
        self.voiceover.setVolume_(volume)

    def is_speaking(self):
        return self.NSSpeechSynthesizer.isAnyApplicationSpeaking()

    def speak(self, text, interrupt=False):
        if interrupt:
            self.silence()

        return self.voiceover.startSpeakingString_(text)

    def silence(self):
        self.voiceover.stopSpeaking()

    def is_active(self):
        return self.voiceover is not None

output_class = SystemVoiceOver 

Here is my fork of accessible_output2 with the implemented classes, plus the modification to the outputs.init.py file necessary to add system_voiceover to the outputs. I also modified the readme to include VoiceOver as an output option and a note at the bottom about the necessity of enabling the setting in VoiceOver to allow being controled by Apple Script. I included e-speak in the list of outputs as well since it was missing, I don't know the status on if e-speak is working or not, but it was missing from the list. Perhaps someone can speak on this.

https://github.com/tbreitenfeldt/accessible_output2

Timothy Breitenfeldt

tbreitenfeldt commented 2 years ago

I went ahead and opened a pull request here:

https://github.com/accessibleapps/accessible_output2/pull/13

Timothy Breitenfeldt

cartertemm commented 2 years ago

Thank you for your work on this. We've heard something of the sort in scattered reports, but at present I don't have a machine capable of running the latest Mac OS. Meaning that I've had a difficult time reproducing and diagnosing it, and also that you'll unfortunately have to bear with my (probably obvious) questions. Some concerns with your approach:

I don't know with any certainty why the original VO class was removed. Probably for the simple reason that calling out to say and/or Apple Script is as dirty as you get. I also don't particularly recall extreme lag--maybe that's something new. In either case, it's more important to have code that everyone can use. Maybe @ctoth has comments?

tbreitenfeldt commented 2 years ago

Hi, I actually did not even think about script injection. Yeah, that could be a problem.I did a little searching and found this stackoverflow

https://stackoverflow.com/questions/56827988/how-to-make-applescript-ignore-escape-characters-in-string-when-making-a-applesc

Looks like we could just use .replace, to replace backslashes and quotes, or it looks like there is possibly an apple script solution.

In terms of psutils, I understand the hesitation. I was not happy about introducing a new library to the project. It was just the only way I could find to efficiently check for a running process in python. What do you mean by using grep? It would seem that you would have to use subprocess.Popen to do something like that, which may not be much better performance wise than using apple script. We would have to get benchmarks on it to see if it lagged to much. With is_active getting called every time Auto.output is called, it needs to be fast.

cartertemm commented 2 years ago

It's pgrep, which should be quick in theory. Under WSL I can pgrep for a process without any noticeable overhead. Timeit tells me that I can get thirty calls / 200MS. This shouldn't differ across platforms. My guess is that osascript is to blame for the lag you are experiencing. The sanatize function from the provided SO thread seems sufficient, yes.

tbreitenfeldt commented 2 years ago

Hi, sorry about the confusion, that is what I get for not looking at the command character by character. I forgot about pgrep. I am not infant of my Mac right now, but will test this when I get home. Just wanted to compile the two solutions here for reference.

So to sanitize a user provided string to avoid script injection, the sanitize function is below

def sanitize(str):
    return str.replace("\\", "\\\\") \
              .replace("\"", "\\\"")

When adding this to the AO2 library, do you think this should be separate method. If it is a separate method should we name mangle to make this sanitize function equivalent to private, so it would be _sanitize

This second solution is for checking if VoiceOver is running in the default VoiceOver class. The command I came up with using pgrep is pgrep --count --ignore-case --exact voiceover We can just run this through subprocess.Popen, and check if the return value is not 0. It should always return 1 otherwise, since there should only be one instance of voiceover running, but checking for 0 will be more exact. Also, I suppose the case insenseitive flag is unnecessary if we determined exactly how the voiceover process is spelled.

Timothy Breitenfeldt

tbreitenfeldt commented 2 years ago

Hi, sorry about the delay on working on this. I did test the above solutions which do work. I have not updated the fork yet though. I can confirm that pgrep works on mac and there is no lag. The flags are a bit different though. WE don't have access to all the functionality that linux has for pgrep, but it works fine. On Mac the command is just: pgrep -x VoiceOver there is no count, or case insenseitive flag. I confirmed the command works though. The -x is for exact matching so we don't get any false positives.

I was getting frustrated with the osascript solution because we don't have any control over silencing speech. The silence function that I wrote to pass a empty string just does not work well after aditional testing. I assume the problem is that there is such a delay caused by the execution of osascript, causing things to not happen immediatly. I had a comment up in the original post about the speak interrupt flag operating backward than expected. This is incorrect, I guess I was just getting lucky that day. In aditional testing, I am not seeing that behavior any more. I was hoping I could improve some efficiency by combining the silence function and speak output when the interrupt flag is set to true, so that osascript was only called once if you want to interrupt speech, however this did not help at all. This output command just seems to interrupt by default, and because of the lag from osascript, it interrupts badly, often letting the phrase spoken before the interrupt call to speak a few words before interrupting.

I did some more research trying to get my hands on some documentation for that output command. I still was not able to find anything though, however I might have found a completely different solution that may work much better. I did find some documentation on one of the solutions I mentioned in my original post on accident for AVSpeechSynthesizer.

AVSpeechSynthesizer has a great property called prefersAssistiveTechnologySettings, which from my understanding is supposed to load the VoiceOver settings for the currently running instance of VoiceOver if exists, and speak with those settings. If VoiceOver is not running with this flag set, it will just use the pre-set configurations like with NSSpeechSynthesizer. In other words, If VoiceOver is on, it should speak with the user settings like we want, and if VoiceOver is off, it will speak with the default voice and settings which can be changed through the properties. Even better, there is already python bindings for the framework that AVSpeechSynthesizer lives in. There is one thing to note here, according to the page I was looking at with the list of python bindings listed below, support for AVFoundation the framework where AVSpeechSynthesizer lives, was just supported in python as of OS 11.3 Big Ser. Big Ser is currently the latest major release for Mac OS. I am currently running 11.6, which I believe is the latest version of Big Ser. This means that developers will have to make sure they have the latest Mac OS to use this VoiceOver class.

As great as this sounds, I have yet to get this prefersAssistiveTechnologySettings working though. Perhaps someone else will have some insite into this. There is a great page that shows the ObjectiveC to Python bindings here:

https://pyobjc.readthedocs.io/en/latest/notes/framework-wrappers.html

And here is the video I stumbled apon which goes over AVSpeechSynthesizer:

https://developer.apple.com/videos/play/wwdc2020/10022/

This is the documentation for AVSpeechSynthesizer:

https://developer.apple.com/documentation/avfaudio/avspeechsynthesizer

With all the important links I found out of the way, here is the code sample I came up with, however the prefersAssistiveTechnologySettings property is not working like expected. I am still getting default VoiceOver output rather than output using my VoiceOver settings.

from AVFoundation import AVSpeechSynthesizer
from AVFoundation import AVSpeechUtterance

voiceover = AVSpeechSynthesizer.alloc().init()
utterance = AVSpeechUtterance.speechUtteranceWithString_("hello world")
utterance.setPrefersAssistiveTechnologySettings_(True)
voiceover.speakUtterance_(utterance)
input()

I did a little bit of learning of ObjectiveC a few weeks ago when I started on this project. Maybe I will see if I can get this AVSpeechSynthesizer working in ObjectiveC. I am a noob at best when it comes to ObjectiveC, so if we need to write our own bindings, I am going to need help.

Any ideas on this? I will continue tinkering with this and doing research, and will repoart back if I find anything of note.

Timothy Breitenfeldt

TheQuinbox commented 2 years ago

Yeah, AVSpeechSynthesizer is prefered and should be used in all cases. I have some swift code around here demoing exactly how it works, provided by Oriol.

import AVFoundation
import Foundation
import UIKit

class SpeakUtilities {
    let synth = AVSpeechSynthesizer()
    var alwaysVo: Bool = false

    var isVoRunning: Bool {
        UIAccessibility.isVoiceOverRunning
    }

    func stopSynth() {
        tts.synth.stopSpeaking(at: .immediate)
    }

    func speakVo(_ text: String) {
        if isVoRunning {
            UIAccessibility.post(notification: .announcement, argument: text)
        } else {
            if alwaysVo { return }
            speakTTS(text)
        }
    }

    func speakVoQueued(_ text: String) {
        if isVoRunning {
            let attributedLabel = NSAttributedString(string: text, attributes: [NSAttributedString.Key.accessibilitySpeechQueueAnnouncement: true])
            UIAccessibility.post(notification: .announcement, argument: attributedLabel)
        } else {
            if alwaysVo { return }
            speakTTS(text)
        }
    }

    func speakVoDelayed(_ text: String) {
        DispatchQueue.main.asyncAfter(deadline: .now() + 0.3) {
            self.speakVo(text)
        }
    }

    func speakTTS(_ text: String) {
        let utt = AVSpeechUtterance(string: text)
        if #available(iOS 14, *) {
            utt.prefersAssistiveTechnologySettings = true
        }
        utt.rate = 0.6
        synth.speak(utt)
    }
}

let tts = SpeakUtilities()

Hope that helps.

TheQuinbox commented 2 years ago

The only issue with this is I think the app has to have focus for VoiceOver to speak. Honestly I haven't found any issues with the current implementation. Any errors that pop up can be fixed by unchecking and rechecking the applescript box.

tbreitenfeldt commented 2 years ago

@TheQuinbox Thanks for the swift code, this helps a lot. I will play with this. Will have to test the comment you made on only speaking when the application has focus. Not honestly sure if this is a big problem though.

That is interesting that you had no problems with the current implementation using appscript. Although the fact that you have to check and uncheck the box is less than ideal, and having to check it at all is not great. This library is ultimately for non-technical users, and expecting users to know how to trouble shoot this behavior I feel is poor design. Also, as I mentioned in the initial post, appscript is depricated, see the sourceforge resource I provided for reference. So even if it is working on most machines, it is a solution we should move away from.

I see two cases that I personally would like to cover, the first is communicating directly to the running instance of VoiceOver, or in other terms, use the existing user VoiceOver settings to speak, and the second is to provide default speech output. This would mirror the behavior on Windows, with NVDA/JAWS/etc... and SAPI. The goal here is to improve user experience.

Thanks,

Timothy Breitenfeldt

jbflow commented 2 years ago

I'm also having issues using this on macOS big sur, will try the fix but the application I'm working on is an accesibility tool for another program, which will have focus when in use. So not sure if this will work. I am looking at other options at the moment.

tbreitenfeldt commented 2 years ago

Hi, sorry I have not done anymore research into this of late, I got very busy with work, and in my personal life. It has been difficult to find time to work on personal projects. I hope to be able to put some more time into this though. I saw a comment on my pull request enquiring if I was going to merge my existing changes. I need to update the repository with some of the findings that were mentioned above with my current solution, although as was discussed above AVSpeechSynthesizer is a better alternative than the existing usage of osascript. More research needs to be done to get AVSpeechSynthesizer working in python correctly such that we are able to properly udilize the users settings rather than always using default speech.

Frankly, I really dislike programming on a Mac, especially Python since the support for reading indentations with Voiceover is incredibly limited. I do intend to come back to this though. Thank you for bringing this back to my attention. If anyone wants to help in this effort that would be very appreciated.

tbreitenfeldt commented 2 years ago

Ok, I updated the pull request with the changes to the voiceover.py file as discussed above. I added a sanitize function to the class to sanitize incoming text for the speak function to help prevent script injection. I also changed how we are detecting if voiceover is running or not using the pgrep solution suggested by cartertemm. As it is, this pull request is ready to merge, however as I mentioned before, this is an imperfect solution since the silence function does not behave correctly. Osascript takes to much time to execute, as a result, voiceover has already started speaking by the time it is able to interrupt itself. I believe this is a better solution than was provided before, especially with the system_voiceover class to provide default speech output if voiceover is not running. However I am going to continue researching VSpeechSynthesizer, and see if there is yet a better solution out there. I will leave it up to the managers of the repository if this PR should be merged yet.

https://github.com/tbreitenfeldt/accessible_output2/tree/voiceover-enhancement

Thanks,

Timothy Breitenfeldt

tbreitenfeldt commented 2 years ago

Hi, so I did some testing with the prefersAssistiveTechnologySettings property on the AVSpeechSynthesizer object in swift. I was unable to get Voiceover output with my custom user Voiceover settings. No matter what I did I seemed to get speech output with default Voiceover settings. My swift code for testing this is below.

import Foundation
import AVFoundation

if #available(macOS 10.14, *) {
let voiceover = AVSpeechSynthesizer()
let utterance = AVSpeechUtterance(string: "hello world!")
if #available(macOS 11.0, *) {
print("sets property")
utterance.prefersAssistiveTechnologySettings = true
}
voiceover.speak(utterance)
let _ = readLine()
}

This code runs for me, but always outputs using Alex with the default rate and pitch, even though my Voiceover is set to use Samantha at 100 percent rate. I have the latest version of MacOS, and I can confirm that the print statement is printing for me, so I know that the version check is passing. Unless anyone has any other ideas on this, I am just not sure if this prefersAssistiveTechnologySettings property does what is expected based on the documentation description.

I was looking into other ways of using apple script to see if we can use the same solution right now, but get away from osascript which I think the issue. It looks like PyObjC has support for a library called ScriptingBridge. This allows for talking directly to apple script objects such as ITunes, Mail, Finder, etc... Perhaps this can be used to talk to VoiceOver too, however I could not find any documentation on the functions to call for VoiceOver . The below python code does run for me though, so the bundle identifier is correct.

from ScriptingBridge import SBApplication
voiceover = SBApplication.applicationWithBundleIdentifier_("com.apple.voiceover")

Here is the docs for ScriptingBridge https://developer.apple.com/documentation/scriptingbridge/?preferredLanguage=occ

I think that the existing solution in accessibleoutput2 right now uses a similar solution using appscript, which as mentioned is deprecated. Perhaps someone who put together the app script solution can comment on this approach. Since I could not find the documentation for voiceover functions to be called here, I started guessing at the function name for speak, although none of the following worked: speak, speakString, speakUtterance, speak, speakString, speakUtterance, output, output_. Throwing code at the compiler and hoping is no fun, documentation would be nice. I am running out of ideas here of things to look into. I keep hitting dead ends , and am just not sure where to go from here.