Speech Translation with language detection

Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK

MIT License

2.68k stars 1.79k forks source link

Speech Translation with language detection #2000

Open igarridowideum opened 12 months ago

igarridowideum commented 12 months ago

Hi,

we are using Speech Translation to translate audio from one language to another. Everything is working fine if we set the source language using SpeechRecognitionLanguage.

Our problem is when we use the language detection. The translation is working fine, but we are not receiving the audio synthetized. We have configurated the language detection with continuous recognition as the documentation said in here: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/language-identification?tabs=continuous&pivots=programming-language-csharp#speech-translation

We also have tried the sample shown in here and the result is the same. The events Recognizing and Recognized are triggered and we get the translation, but not the event Synthesizing, so no audio is received.

Is there any reason why we are not receiving the audio? is it not working for language identification?

We are using SDK v1.28.0.

Regards

pankopon commented 11 months ago

Hi, synthesized audio is indeed missing in this case. I'm checking with service people whether this is intentional or not, will update here when it's confirmed.

pankopon commented 11 months ago

Just to note, the issue is still under discussion, but in the meantime you could try "manual synthesis" as documented in https://learn.microsoft.com/azure/cognitive-services/speech-service/how-to-translate-speech?tabs=terminal&pivots=programming-language-csharp#manual-synthesis i.e. use the translation text as input for a separate SpeechSynthesizer instance. Of course, this may add some additional latency so it depends on your use case whether this is acceptable.

igarridowideum commented 11 months ago

Hi, thanks for the recommendation. As a temporary measure, while the issue is being discussed, and because our problem was with language detection, what we have done is add a language selector before starting the translations, so the user can set their source language. It is an extra step for the user, but with this solution we are receiving the synthesized audio and we are also avoiding this possible additional latency.

pankopon commented 11 months ago

Internal work item ref. 5493198.

pankopon commented 11 months ago

There has been no resolution on this issue in discussions with the service side so far, but I created a work item for internal follow-up as there is no good reason why synthesized audio in the target language couldn't be available regardless of input language detection. Currently there is no ETA, we will update this when we know a schedule for the fix.