Add the ability to set the voice speed

marisademeglio commented 5 months ago

From recent tester feedback

bertfrees commented 5 months ago

We need to think about a user interface for specifying user style sheets (as well as lexicons).

In the current version each TTS script has a "style sheets" option, but the usability of this option could be improved:

Because the option has type anyURI, it does not result in a file picker in the UI.
It might be useful to have a global setting for the style sheet, for the same reason why the voice configuration is global.
Some users might prefer setting options (that apply to the whole document) over writing CSS. Voice speed would be one of those options.

marisademeglio commented 5 months ago

Is there a global setting for TTS stylesheets via /properties API?

bertfrees commented 5 months ago

No there isn't. The thing is that different style sheets might be needed for different scripts and input formats. The approach I was thinking of is to go via a templating feature in the UI that would deal with this in a generic way.

marisademeglio commented 5 months ago

Is the TTS voice speed set via the properties API or via stylesheets? Or both?

bertfrees commented 5 months ago

Currently it can only be set via style sheets.

marisademeglio commented 4 months ago

Ok so are you thinking of a widget-based form that a stylesheet can be generated from behind the scenes, and the scripts just get that generated stylesheet? Much like what we did (do) for ttsConfig (though we are replacing that with properties soon enough).

If so, what controls do you want to expose, other than rate? I'm not familiar with aural CSS.

bertfrees commented 4 months ago

are you thinking of a widget-based form that a stylesheet can be generated from behind the scenes, and the scripts just get that generated stylesheet

That is indeed what I was thinking of when I said "Some users might prefer setting options". But I don't know if that is the way to go. It does make sense for the default voice speed. But I don't know if there are many other settings for which it would make sense to set them globally. The supported CSS properties are listed here (but note that not all voices support all of them!).

marisademeglio commented 4 months ago

We can do something simple and have a way to control speech rate globally.

If we wanted to add more properties: I don't know how to deal with the properties that are unsupported by some voices, that's confusing for users. Do we say "Supported by XYZ" or "Supported in some engines" (is it voices or engines?)

And here is another issue - if users want to use their own CSS, but we have this interface instead replacing the CSS URL fields, they wouldn't have a way to do so.

ways2read commented 4 months ago

Could the global speech rate be considered as an adjustment to what would happen as default or via the CSS? If this is a slider, in the middle everything is the same, a lower value slows the speech and a higher value increases the speed. This would be a little like the relative speech rate in MathCAT which can slow down the announcement of math expressions in NVDA.

bertfrees commented 4 months ago

I don't know how to deal with the properties that are unsupported by some voices

I don't know how to deal with that either. Note that this problem affects the speech rate control too. I don't think all voices will have controllable rate. I could add attributes to the voices XML from the /voices call to indicate which of the CSS properties each voice supports. Something like this:

<voices href="http://localhost:49152/ws/voices">
  <voice engine="google" gender="female-adult" lang="nl-BE" name="nl-BE-Standard-A" supports-css="speech-rate"/>
  ...
</voices>

Could the global speech rate be considered as an adjustment to what would happen as default or via the CSS?

Yes, a relative value sound the most user-friendly indeed.

Relative values (i.e. percentages) within the CSS could possibly be made relative to the global value, rather than relative to the value medium.

marisademeglio commented 4 months ago

I think it's a lot of information to fit in if we start listing the supported properties of each voice. Maybe that belongs in a reference document instead?

I see that speech rate can be

A number, “x-slow”, “slow”, “medium”, “fast” or “x-fast”

So to have this idea of relative values, the number value for speech rate would be a percentage relative to the global value, which comes from where?

bertfrees commented 4 months ago

I think it's a lot of information to fit in if we start listing the supported properties of each voice

I was thinking the GUI would only use the information that it can use. I wasn't necessarily thinking of including all the information in the voices table. You could e.g. use it to create a warning message next to the speech rate slider if the rate can not be controlled for some of the selected TTS engines.

to have this idea of relative values, the number value for speech rate would be a percentage relative to the global value, which comes from where

In this case the global value can not come from a base style sheet, like we discussed before, but it would have to be set through a property.

marisademeglio commented 4 months ago

Can multiple stylesheets be applied? E.g. we have global.css for the default speed, set in the settings dialog, and then the user can optionally pass their own stylesheet with more rules specific to their content. So the script form has a file picker and also there's a slider widget in the settings dialog.

bertfrees commented 4 months ago

Sure, multiple style sheets can be applied. Whether or not we generate some CSS behind the scenes, the user should always remain able to specify their own style sheets.

The style sheet options take a space separated list of URIs, absolute or relative to the input. But for simplicity I think it will be fine to limit it to a single absolute file URI, so that we can indeed use a file picker.

marisademeglio commented 4 months ago

Global speed will come from a property called org.daisy.pipeline.tts.speech-rate

marisademeglio commented 4 months ago

UI should be a range slider with steps for these targets:

x-slow => 40% / 44%
slow => 60% / 67%
medium => 100%
fast => 150% / 167%
x-fast => 250% / 278%

marisademeglio commented 4 months ago

I'm assuming the default value here is 100%. @bertfrees what's the format here, e.g 50, 100, 200? Or 0.5, 1.0, 2.0? Or "50%", "100%", etc?

bertfrees commented 4 months ago

"50%", "100%", etc.

marisademeglio commented 4 months ago

WIP see 01a91e2246165dd1fd212aba4cb068e038865ff7

marisademeglio commented 4 months ago

WIP see 880a65b93b22b13ce40df5cc8e8c1f7db484a774

marisademeglio commented 3 months ago

@bertfrees did you end up implementing a way to find which voices support speech-rate? I checked the /voices endpoint for a supports-css attribute like what you mentioned but I didn't find anything. My current engine version is 1.14.18-SNAPSHOT.

bertfrees commented 3 months ago

Oh, I didn't end up implementing that. Do you think it will be useful? It could still be done, shouldn't be much work.

bertfrees commented 3 months ago

By the way I also still have to implement the org.daisy.pipeline.tts.speech-rate property.

marisademeglio commented 3 months ago

Oh, I didn't end up implementing that. Do you think it will be useful? It could still be done, shouldn't be much work.

It would be for reporting which engines support speech rate. Unless you just want to tell me where it's supported?

bertfrees commented 3 months ago

@marisa In the latest version of the engine (https://github.com/daisy/pipeline-ui/pull/211) I have now included both the org.daisy.pipeline.tts.speech-rate property, plus a new API endpoint /tts-engines which lists the working engines. If an engine supports changing the voice speed, it has an attribute features with value speech-rate.

NPavie commented 3 months ago

I added the new endpoint in 901ad80e2945947d81c28b95b5d0dc9e64a3fe18

The engines features set (well, for now only "speech-rate" is loaded in the states of the app after loading the voices (both at startup or on connecting to a remote TTS engine)

bertfrees commented 2 months ago

The speech rate setting does not seem to be persisted, so I can't test it.

Another issue is that it currently says:

Setting the speech rate is currently supported on and azure engines on your system.

This would be better:

Setting the speech rate is currently supported on azure voices.

I replaced "engines" with "voices", removed the stray "and", and dropped "on your system" because it seemed unneeded.

NPavie commented 2 months ago

I replaced "engines" with "voices", removed the stray "and", and dropped "on your system" because it seemed unneeded.

Completely forgot the one-engine case when doing the message composition >< (got 2 engines reported with support on my PC)

marisademeglio commented 2 months ago

Can we close this one? It's working for me.

bertfrees commented 2 months ago

It is working for me too.

daisy / pipeline-ui

Add the ability to set the voice speed #191