daisy / pipeline

Super-project that aggregates all Pipeline related code, provides a common tracker for Pipeline related issues and holds the Pipeline website
http://daisy.github.io/pipeline
21 stars 21 forks source link

TTS speech rate number always sets to fast #751

Closed torchtrust closed 10 months ago

torchtrust commented 11 months ago

DAISY Pipeline 1.2.7-RC1 DTBook to DAISY 3: using css in the config:

    h1 { pause: 30ms; volume: x-loud } 
    h2 { pause: 30ms 40ms; volume: loud } 
    p { pause: 20ms; speech-rate: 145 } 
    p.quotation { pitch: high }

We want the speech-rate just a bit slower than normal. I have tried all sorts of numerical values but they all have the same speed confirmed by the length of the mp3 file. I tried slow but it was too slow. Any numerical value is too fast. Any ideas? Thanks Paul

marisademeglio commented 11 months ago

Interesting! Let's ask @bertfrees if he has any thoughts.

I ran a test just using the Pipeline engine via command line and I passed this XML file for tts-config:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
  <config>
    <voice engine="azure" name="en-AU-NatashaNeural" lang="en-AU" gender="female-adult" priority="1"/>
    <property key="org.daisy.pipeline.tts.azure.key" value="*****" />
    <property key="org.daisy.pipeline.tts.azure.region" value="westus" />
    <css href="aural.css"/>
  </config>

Where aural.css contained the CSS that you wrote above.

I did not notice any change in speech rate, not even when I made the pauses quite large (300ms).

marisademeglio commented 11 months ago

If you're editing ttsConfig.xml directly then there is a risk that it would get overwritten by the Pipeline UI, as that file gets regenerated. So that's why I ran a test directly on the command line.

bertfrees commented 11 months ago

@torchtrust Can you tell me which voice you were using?

torchtrust commented 11 months ago
bertfrees commented 11 months ago

@torchtrust It appears that Azure interprets the numeric value not as an absolute value (words per minute), but as a relative value (e.g. 1.5 means 50% faster). See https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-synthesis-markup-voice#adjust-prosody for more info.

Unfortunately this does not match with how Pipeline interprets the values. It won't accept values with a decimal point.

I've done a local fix that normalizes numeric values to what Azure expects, by dividing the number by 200. That seems to work.

torchtrust commented 11 months ago

@bertfrees Pipeline UI seems to accept -8% for instance, so I am using that. Maybe the documentation just needs changing for the Azure voices. thanks