Closed torchtrust closed 10 months ago
Interesting! Let's ask @bertfrees if he has any thoughts.
I ran a test just using the Pipeline engine via command line and I passed this XML file for tts-config:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<config>
<voice engine="azure" name="en-AU-NatashaNeural" lang="en-AU" gender="female-adult" priority="1"/>
<property key="org.daisy.pipeline.tts.azure.key" value="*****" />
<property key="org.daisy.pipeline.tts.azure.region" value="westus" />
<css href="aural.css"/>
</config>
Where aural.css contained the CSS that you wrote above.
I did not notice any change in speech rate, not even when I made the pauses quite large (300ms).
If you're editing ttsConfig.xml directly then there is a risk that it would get overwritten by the Pipeline UI, as that file gets regenerated. So that's why I ran a test directly on the command line.
@torchtrust Can you tell me which voice you were using?
@torchtrust It appears that Azure interprets the numeric value not as an absolute value (words per minute), but as a relative value (e.g. 1.5
means 50% faster). See https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-voice#adjust-prosody for more info.
Unfortunately this does not match with how Pipeline interprets the values. It won't accept values with a decimal point.
I've done a local fix that normalizes numeric values to what Azure expects, by dividing the number by 200. That seems to work.
@bertfrees Pipeline UI seems to accept -8% for instance, so I am using that. Maybe the documentation just needs changing for the Azure voices. thanks
DAISY Pipeline 1.2.7-RC1 DTBook to DAISY 3: using css in the config:
We want the speech-rate just a bit slower than normal. I have tried all sorts of numerical values but they all have the same speed confirmed by the length of the mp3 file. I tried slow but it was too slow. Any numerical value is too fast. Any ideas? Thanks Paul