brailcom / speechd

Common high-level interface to speech synthesis
GNU General Public License v2.0
231 stars 68 forks source link

Adding an even-lower priority? #879

Open sthibaul opened 8 months ago

sthibaul commented 8 months ago

Is your feature request related to a problem? Please describe.

from @eeejay:

“If you were listening to any other spoken words like a podcast and audio book, you wouldn't expect that media to stop or for the screen reader to wait until the media is done. The way I have seen other platforms deal with this is audio ducking”

Describe the solution you'd like

I don't think users really want audio ducking, since that means they don't hear what is spoken in the podcast/audio book. But if the current behavior is indeed waiting for the media to be done it can be a problem.

Among the priority categories https://htmlpreview.github.io/?https://github.com/brailcom/speechd/blob/master/doc/ssip.html#Priority-Categories we have for instance "Priority important" and "Priority message", which are able to interrupt some other priorities. Orca currently uses "Priority message", and this is probably what we want for screen reading activities. "Priority text" was rather meant for activities such as reading a whole document. When a message comes, the reading is canceled, AIUI it's because we consider that the user is stopping the read to fix a typo or something, and thus we indeed do not want to continue reading.

For a podcast or audio book, i.e. a rather background activity, these priorities don't really match: we'd want something that can be paused by screen reading activities, but not interrupted completely, so that the background reading resumes after the screen reader action.

So we could introduce e.g. a "Priority background" (or such, wording suggestions welcome!), that is not canceled, but only paused by higher priorities.

sthibaul commented 8 months ago

Also from @albertotirla :

“it happened a lot of times for me, that either an app was speaking and my screenreader had to wait its turn, and if the app in question spoke a really long message I'd wait a lot with my screenreader frozen, or that the sr would speak, but then the app will not only not speak, however the message it used to speak will be lost forever, because it'll never be spoken again”

@albertotirla: which kind of application is that? Is that a self-voicing app, or are you reading it through orca? We'd need more precise descriptions, to be sure to cover the need.

albertotirla commented 8 months ago

yes, you could count it as a self-voicing application, speaking a very long message, which begun speaking before orca should have. This sometimes happens with odilia as well, if it starts reading its own logs, orca can't take back control of speech, and even if it exits, spd doesn't flush its buffers for odilia, therefore we have to wait till the current messages finished speaking. By the way, restarting orca doesn't fix it, you can either fully restart spd, or wait. The same happens if orca starts speaking something and then odilia tryes to speak, same for any other self-voicing app I tryed, which there aren't many

sthibaul commented 8 months ago

If odilia is using "priority message" like orca is using, then it cannot be interrupted by orca, and vice-versa. Only "priority important" can interrupt "priority message". One can also use spd-say -S to stop whatever is currently being spoken, that'd work for important messages too.

sthibaul commented 8 months ago

Does this situation happen with other applications than Odilia?

sthibaul commented 8 months ago

ah, you mentioned "any other self-voicing app I tried". Well, it doesn't happen with the spd-say app :)

sthibaul commented 8 months ago

I tried dasher, orca does interrupt its self-voicing. So we need a more specific example of self-voicing application that orca can't interrupt. Also, does this happen with all voices or only some?

sthibaul commented 8 months ago

I have added to the Orca bts a request feature for a shut-up action, that would call stop https://gitlab.gnome.org/GNOME/orca/-/issues/486

eeejay commented 8 months ago

“If you were listening to any other spoken words like a podcast and audio book, you wouldn't expect that media to stop or for the screen reader to wait until the media is done. The way I have seen other platforms deal with this is audio ducking”

To clarify, I was making the argument that background spoken word is common and that there are certain kinds of media where scheduled speaking is not practical. If it is recorded human speech (like a podcast or audiobook) we cannot interleave synthesized text with it. The only option is ducking.

eeejay commented 8 months ago

I tried dasher, orca does interrupt its self-voicing. So we need a more specific example of self-voicing application that orca can't interrupt. Also, does this happen with all voices or only some?

Firefox uses a "message" priority. This means that if a user chooses to use the "narrate" feature in reader view. Once they press play Orca will become unresponsive and only speak in between paragraphs. For example, if you tab around or switch apps Orca will only tell you about it when Firefox is done speaking a paragraph.

It could be argued that we should use the "text" priority, but that would mean the paragraphs will be skipped if the user touched the keyboard and makes orca speak.

sthibaul commented 8 months ago

“If you were listening to any other spoken words like a podcast and audio book, you wouldn't expect that media to stop or for the screen reader to wait until the media is done. The way I have seen other platforms deal with this is audio ducking”

To clarify, I was making the argument that background spoken word is common and that there are certain kinds of media where scheduled speaking is not practical. If it is recorded human speech (like a podcast or audiobook) we cannot interleave synthesized text with it. The only option is ducking.

I'm not sure what you mean by "scheduled speaking"? exactly?

I also don't understand why you both say that "we cannot interleave synthesized text with it", and "the only option ducking". Do you mean with what spd currently currently provides, or what we want in the end?

I mean, what I proposed is exactly to make the "background" priority possibly get interleaving with screen reader actions, so that a user can listen to a podcast, get some notification, do some key presses, all along the podcast still playing, but not mixing things so that everything remains intelligible, just interleaved. Is that not practical? It seems less a problem than audio ducking which to me seems to effectively possibly lead users to miss pieces of the podcast.

sthibaul commented 8 months ago

I tried dasher, orca does interrupt its self-voicing. So we need a more specific example of self-voicing application that orca can't interrupt. Also, does this happen with all voices or only some?

Firefox uses a "message" priority. This means that if a user chooses to use the "narrate" feature in reader view. Once they press play Orca will become unresponsive and only speak in between paragraphs. For example, if you tab around or switch apps Orca will only tell you about it when Firefox is done speaking a paragraph.

It could be argued that we should use the "text" priority, but that would mean the paragraphs will be skipped if the user touched the keyboard and makes orca speak.

Again, that's why in what I propose here, the "background" priority would not skip the paragraph, just pause the audio feedback while the orca messages get spoken.

albertotirla commented 7 months ago

I dk what dasher is, but the same issue happens with long messages in mumble and teamtalk, possibly a qt implementation issue?

sthibaul commented 7 months ago

mumble uses speechd directly, and is using message priority, so yes that wouldn't be interruptible. It should be using text priority to be interruptible.

Where can the source of teamtalk be downloaded?

Nardol commented 7 months ago

TeamTalk is at https://github.com/bearware/TeamTalk5 It uses Qt Speech to make announcements or alternatively, it calls the notify-send command line.

sthibaul commented 7 months ago

qtspeech also uses priority message, so again not interruptible indeed. It should introduce some parameter to let callers decide between message (non-interruptible) and text (interruptible)

sthibaul commented 7 months ago

Thinking about the original question, I'm actually thinking of something: currently interruption does interrupt entirely the ongoing speech. That's really a problem for self-voicing applications which are not aware that the screen reader asked to say something, and wouldn't know that they have to start over. So we'd tend to want, for these, for speech to just resume after screen reader speak. That changes the semantic of interruption quite a bit. But probably for the better? As in better isolation of different clients. We can keep the hard-interrupt semantic when it's an utterance from the same client, so as to keep the documented semantic intra-client.

sthibaul commented 7 months ago

That being said, having a priority that doesn't interrupt itself can be useful, to e.g. queue several utterances and let them flow. ideas for names other than background welcome!

jpwhiting commented 7 months ago

how about enqueue?

On Sun, Apr 28, 2024, 4:17 PM Samuel Thibault @.***> wrote:

That being said, having a priority that doesn't interrupt itself can be useful, to e.g. queue several utterances and let them flow. ideas for names other than background welcome!

— Reply to this email directly, view it on GitHub https://github.com/brailcom/speechd/issues/879#issuecomment-2081677139, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHDYPD4D4ZKZUXRUXRZRWTY7VYPJAVCNFSM6AAAAABE2JB4VCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBRGY3TOMJTHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

sthibaul commented 7 months ago

enqueue seems too generic to me: we already have the notion of queue for each priority level.