Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.88k stars 1.85k forks source link

TTS: Excessive silence at the end of audio generated using gu-IN-DhwaniNeural voice #2510

Open luzhanov opened 3 months ago

luzhanov commented 3 months ago

Describe the bug Audios generated for gu-IN locale using voice gu-IN-DhwaniNeural contains about 3 sec silence at the end of audio file. The same generation, performed using gu-IN-NiranjanNeural voice, produced a normal file without long silence (see attached samples and screenshot).

Here is a length difference between gu-IN-NiranjanNeural voice (shorter) and gu-IN-DhwaniNeural voice (longer) on the same text above: image

Audio files generated: gu-audios.zip

To Reproduce Use next SSML for audio generation:

<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts"
 version="1.0" xml:lang="gu-IN">
  <voice name="gu-IN-DhwaniNeural">
    <mstts:silence type="Leading-exact" value="0ms"/>ઉનાળો મારી પ્રિય મોસમ છે.<mstts:silence type="Tailing-exact" value="0ms"/>
  </voice>
</speak>

Expected behavior gu-IN-DhwaniNeural voice should generate audio without a long (~3sec) silence at the end for the SSML with <mstts:silence type="Tailing-exact" value="0ms"/>

Version of the Cognitive Services Speech SDK Java SDK 1.36.0

Platform, Operating System, and Programming Language

github-actions[bot] commented 2 months ago

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.