Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.85k stars 1.84k forks source link

Why is there no speed control with respect to using SpeakTextAsync? #1932

Closed jtsoftware closed 1 year ago

jtsoftware commented 1 year ago

Google has it. The old .Net had it. Is using SSML the only option, which apparently increases the cost with the extra characters? I'm doing a language study web app where I would like to have slower speaking as an option. What would be a minimal SSML element that just sets the speed? I tried something like:

        text = "<speak><prosody rate=\""
            + speakingRate.ToString()
            + "%\">" + text + "</prosody></speak>";

but it returned an error.

ralph-msft commented 1 year ago

The minimal XML to be able to set slow down the speech rate by e.g. 90% would be:

<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:mstts='http://www.w3.org/2001/mstts' xml:lang='en-US'>
  <voice name='en-US-CoraNeural'>
    <prosody rate='-90%'>Hi there! How can I help you?</prosody>
  </voice>
</speak>

You can find out more information about the prosody element here: https://learn.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup-voice#adjust-prosody

And more information on SSML in general here: https://learn.microsoft.com/azure/cognitive-services/speech-service/speech-synthesis-markup

You can find out more information on what is considered a billable character here: https://learn.microsoft.com/azure/cognitive-services/speech-service/text-to-speech#pricing-note