Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.84k stars 1.83k forks source link

SSML bookmarks broken when using the <lang> tag with multilingual voice #1802

Open sibbl opened 1 year ago

sibbl commented 1 year ago

Describe the bug When synthesizing SSML with the voice en-US-JennyMultilingualNeural using the .NET SDK, the bookmark events are fired with unusable data or even not at all.

To Reproduce

Synthesizing sample 1...

<speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-US">
    <voice name='en-US-JennyMultilingualNeural' xml:lang="en-US">
        <bookmark mark="One"/>
        <lang xml:lang="en-US"> A </lang>
        <bookmark mark="Two"/>
        <lang xml:lang="en-US"> B </lang>
        <bookmark mark="Three"/>
        <lang xml:lang="en-US"> C </lang>
        <bookmark mark="Four"/>
    </voice>
</speak>

...returns...

Synthesis started: SynthesizingAudioStarted
Bookmark reached at 0: 児ࡓ网
Bookmark reached at 0: 樐࡜网
Bookmark reached at 0: ࡕ网
Bookmark reached at 0: Four
Synthesis completed

...with the first 3 bookmark names being different characters each time you try it.

Synthesizing sample 2...

<speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-US">
    <voice name='en-US-JennyMultilingualNeural' xml:lang="en-US">
        <lang xml:lang="en-US">
            <bookmark mark="One"/>
            A
            <bookmark mark="Two"/>
            B
            <bookmark mark="Three"/>
            C
            <bookmark mark="Four"/>
        </lang>
    </voice>
</speak>

...returns no bookmarks at all:

Synthesis started: SynthesizingAudioStarted
Synthesis completed

Expected behavior

Synthesizing the working sample...

<speak xmlns="http://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-US">
    <voice name='en-US-JennyMultilingualNeural' xml:lang="en-US">
        <prosody rate="+15%">
            <bookmark mark="One"/>
            A
            <bookmark mark="Two"/>
            B
            <bookmark mark="Three"/>
            C
            <bookmark mark="Four"/>
        </prosody>
    </voice>
</speak>

...returns...

Bookmark reached at 0: One
Bookmark reached at 4250000: Two
Bookmark reached at 10500000: Three
Bookmark reached at 17500000: Four

...as expected. Adding prosody or phoneme tags does work as expected, as well.

Version of the Cognitive Services Speech SDK 1.24.2

Platform, Operating System, and Programming Language

Please let me know if providing the demo application we used for testing would help. Thanks!

yulin-li commented 1 year ago

Thanks for reporting this issue. We are investigating.

jiajzhan commented 1 year ago

Thanks for reporting this issue. I will investigate this issue.

pankopon commented 1 year ago

@jiajzhan Please update with the latest status.

jiajzhan commented 1 year ago

hi @pankopon , this issue is under fixing.

pankopon commented 1 year ago

Presuming the same internal work item ref. 4681629 as in another case - if different then @jiajzhan please update.

sibbl commented 1 year ago

Half a year later: is there any update on that work item?

dargilco commented 1 year ago

@jiajzhan any update on this?

Kerry-LinZhang commented 1 year ago

keep tracking on the feedback, there is no clear ETA for the issue currently, I will follow up with @jiajzhan for further plan, @dargilco @sibbl we will keep you updated for it.

pankopon commented 6 months ago

Closing the item as unsupported because no update since September. @Kerry-LinZhang @yulin-li FYI

sibbl commented 6 months ago

Just for confirmation: the multilingual voice continues to select wrong languages for pronouncing words and you won't support API users to correct the selected language by using the SSML syntax as described above?

Zeouterlimits commented 6 months ago

Yeah, this is a frustrating issue that will likely prevent us from using Multilingual voices, can this be re-opened please?

pankopon commented 5 months ago

@Kerry-LinZhang @yulin-li @jiajzhan Is there any plan for a fix in the near future / this year / later? If this is a critical issue then please provide updates on the status.

Kerry-LinZhang commented 5 months ago

Hi @pankopon I am continue tracking the problem, we will update the status as soon as possible.

meetakshay99 commented 4 months ago

Any update on the timeline on fix for this will be helpful. Thanks.

trulience commented 4 months ago

We would very much like to use multilingual voices in our project but this bug is preventing us from being able to. Are you able to confirm if the bug has been re-opened please? I am not sure whether "tracking the problem" means that you acknowledge it as a problem that you intend to try and fix, or that you might just track it indefinitely. Thank you

Kerry-LinZhang commented 3 months ago

Assign @jiajzhan to follow up