Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.88k stars 1.85k forks source link

Speech recognition quality got worse? #2101

Closed Trunksome closed 5 months ago

Trunksome commented 1 year ago

In the recent 1-2 weeks many of my users are reporting a decrease in speech detection quality, and I am struggling to understand what that could be. I noticed an increase in "SpeechNotRecognized" events, the wrong words being picked up, and the custom phrase lists seem to have less impact.

I know this is not exactly a precise bug report, but maybe there were some updates made to the cloud service that could have caused this. Can you tell me if there were any changes deployed that could have impacted speech recognition quality recently? Maybe some other devs here share the same problem...

rhurey commented 1 year ago

First, thanks for using the Speech Service and letting us know about how things are going.

The Speech Service does update the SR models over time and constantly evaluates the quality model changes. I've already passed your feedback to the team that owns recognition quality, but it might be helpful to know what region(s) you've gotten these reports from.

Trunksome commented 1 year ago

Thank you for replying and passing it on! We are on West Europe. I noticed it primarily in German - I am a native speaker and still ran into a lot of SpeechNotRecognized responses that I never experienced like this before. I did get reports of problems with Swedish speech recognition too.

devkya commented 1 year ago

Hello. We are also operating our app service using speechsdk. We are experiencing similar issues as the questioner. Recently, the accuracy of transcribed text has been decreasing, and the word "overlap" frequently appears at the end of sentences. We are using the Korean language. The SDK version we are using is 1.30.0.

Trunksome commented 1 year ago

I got the impression that it sometimes works fine, and then other times fails miserably (for the whole duration of the session).

Here in this video you can see such a failed case, using the Unity Speech SDK and German as the set language. Every time there is a "?" displayed and the characters asks the player to repeat themselves, a SpeechNotRecognized response was received from cognitive services.

https://github.com/Azure-Samples/cognitive-services-speech-sdk/assets/16858868/ffae21c4-119a-4602-b2ee-7db1438e6dfb

In the video the phrase "Ich heiße Linn" was added to the phrase list, yet it still refuses to pick up on it. Yes, there is an accent, but that was never a problem a few weeks ago, running the same code. Afterwards I tried to say some things (native German), it still wouldnt pick up on anything (or just one or two words).

Did you get further in figuring out what can be the issue here? // this video was recorded on the 28th of October

ralph-msft commented 12 months ago

We are still investigating this but unfortunately don't have any updates to share at this time. (I-437395834)

devkya commented 12 months ago

@Trunksome Hello. I was a user who had the same issues as you. It seems that most of the problems I experienced have disappeared since yesterday. I would like to ask if your problems have disappeared as well

oscarhinde commented 12 months ago

I'd like to add our voice to this issue. Since roughly a month ago we've seen a drastic increase in speech recognition issues, mostly audio with clearly uttered words that are not being transcribed. We use SR to transcribe phone calls and the number of events in which no text was provided has almost doubled. This is mostly in Spanish, but we're also seeing it with our English speaking clients. Nothing in our codebase has changed and the type of audio we're processing has also remained consistent, so we don't think it has anything to do with our end of things. Any information would be much appreciated.

Trunksome commented 12 months ago

@Trunksome Hello. I was a user who had the same issues as you. It seems that most of the problems I experienced have disappeared since yesterday. I would like to ask if your problems have disappeared as well

Just today I got another message from a user saying "For some reason i had to repeat myself multiple times because the app couldn’t recognize what I was saying" -- she was using German.

In my experience it does work alright sometimes, but other times just fails completely. In these past few weeks, German in our app went from one of the most popular to one of the least popular, and its not hard to guess why that is...

devkya commented 12 months ago

I have also been hearing complaints from numerous clients about the deterioration of recognition quality. It might have felt like it improved temporarily. I will continue to monitor the issue and provide updates...

vkjambit commented 11 months ago

it is really interesting to read about this problems here. in the last weeks we have a lot of feedback from our users which are complaining that short yes / no answers (in german ja / nein) to questions our app provides are not understood at all.

oscarhinde commented 11 months ago

it is really interesting to read about this problems here. in the last weeks we have a lot of feedback from our users which are complaining that short yes / no answers (in german ja / nein) to questions our app provides are not understood at all.

Yes, this is our experience as well: the issues are especially noticeable with short monosyllabic utterances.

vkjambit commented 11 months ago

so i found a flag .enableDictation() and just tried it - seems to solve my issues completely !!!

Trunksome commented 11 months ago

team that owns recognition quality

Hi Ryan! What did the team that owns recognition quality have to say about this issue? Can you share any updates?

pankopon commented 11 months ago

Internal incident report I-437395834 was acknowledged by a service team, based on comments the issue seems related to a recent model update. Assigning this to @mahilleb-msft for updates on status.

oscarhinde commented 11 months ago

Another interesting aspect that I suspect is related to this issue: we have found that the transcription model seems to have become less sensitive to background noise (e.g. a loud TV in the room). This leads me to believe that some form of threshold has been either put in place or modified and that this is affecting the performance on shorter or less clear utterances.

Trunksome commented 11 months ago

I just read about signification improvements to the speech to text service (https://learn.microsoft.com/en-us/azure/ai-services/speech-service/releasenotes?tabs=speech-to-text) - are these changes already live? How can we try them out?

I'm still getting consistent reports from users, especially the ones using German, that the voice recognition is working unreliably. Sometimes well, sometimes not picking up anything at all.

github-actions[bot] commented 10 months ago

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

pankopon commented 8 months ago

Service incident ref. 437395834 was superseded by 439691956, summary of developments:

github-actions[bot] commented 7 months ago

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

pankopon commented 5 months ago

Based on the service incident update, the fix has been deployed. If you observe similar service quality issues again, please consider creating an Azure support ticket at https://azure.microsoft.com/support