Closed jelizavetazaharova closed 9 months ago
Thanks for reporting this issue.
I looked at the telemetry for that session and the service didn't segment the 2nd phrase like I'd have expected it to.
I tried a couple of different audio files I created to get a repro but wasn't successful.
Do you have an audio file you can share? It will help our service team investigate the problem.
Actually, looing at #2212 it occurs to me you may be hitting the same problem, but instead of having the audio segment, you're hitting the default maximum buffer size which is precariously close to the maximum phrase length.
Looking at the configuration for the request I do see the segmentation timeout is set at 5s, which is definitely a factor here.
Thank you @rhurey for your quick response Here is a failing part of the file: Audio.zip
After the first sentence, there is a silence fragment and then there are a few more sentences.
Earbuds, come on, go, go, go, go.
I think I got it. Good. Yeah. I'm so sorry about. Yeah. I should not be doing interviews right after getting home from an international trip. I'm just like a complete mess. So, yeah. How are you doing? Not bad. It's actually kind of slow this week. So it worked out. OK, OK, good. I am very glad. So, yeah, I mean, I kind of I messaged you like, in the e-mail, I already kind of explained that this is a short profile for decibel. And so generally what I do with these profiles, because they are so short, is I only ask maximum two or three questions just because I think it's more effective for everyone's time. And the main thing that I wanted to ask you about with the new Stygian Crown record was there was a comment that the band made about the theme of the record. And I thought that it was pretty interesting and I wanted to dig into it further. And it was about how the theme of the album is about the origin of monsters and how they're conceptualized.
As for the segmentation timeout, we set the property SegmentationSilenceTimeoutMs value to 5000. It was stated in the documentation, that the higher value allows having longer pauses in a speech.
Thanks for the follow-up.
Turning up the segmentation timeout does allow for longer pauses in speech that will be recognized as a single phrase. There are a number of scenarios where it can be beneficial to allow for a longer than typical pause between words when recognizing a single phrase.
There are tradeoffs, though. The biggest being that the Speech Service will only return final recognized results after enough silence has happened, which can result in increased latency.
There are also other interconnected pieces that can increase friction.
One of which is the SDK's resiliency buffer that can hold just over a minute of audio, which is what's happening here. The service segmented the phrase at ~55 seconds, but that wound up just missing being quick enough for the buffer to not have filled.
@rhurey Referring to what you said, what would be a possible solution for us? Should we set segmentation silence timeout to a smaller value? If so, will it work for us if we want to allow longer pauses? Can this issue happen again if the smaller silence timeout value is set and the phrase is segmented at ~59 seconds?
Good questions...
5s is a long time for someone to pause in the middle of what is a single sentence. Larger values are likely more useful if you're doing single phrase recognition as part of a command system. It looks like your scenario is more transcription based where something much closer to the default of 500ms will produce acceptable results.
Having a lower segmentation time will reduce the odds of filling the client buffer in a couple of different ways. First, the odds of phrases being segmented go up close to the defaults. Secondly, as the phrase length gets longer the Speech Service becomes more aggressive at detecting the end of a phrase, and starting from a lower initial number will definitely increase the chances of the phrase end being found before the client overflows.
SESSION STARTED: SessionEventArgs(session_id=0de3b2a37dbb444ab3b9bfcc25042e03) CANCELED SpeechRecognitionCanceledEventArgs(session_id=0de3b2a37dbb444ab3b9bfcc25042e03, result=SpeechRecognitionResult(result_id=5f7c56eb546448c4a24c09136284ce1f, text="", reason=ResultReason.Canceled))
please look at the above error and please help the same code working fine with local machine but give error in ubuntu server
@anshika24khathuriya can you open a new issue? It helps keep problems separate and makes tracking progress easier.
@anshika24khathuriya can you open a new issue? It helps keep problems separate and makes tracking progress easier.
The issue is is gives wrong scores in every file it gives 0 0 0 0 scores
@@anshika24khathuriya Please file a separate issue so we can better assist you.
Closing this out to keep our issue list current.
@rhurey Sorry for the late reply. We checked the solution you suggested, and the decreasing of the silence timeout to 500ms helped us to escape the ServiceTimeout issue. Thanks a lot for your help!
Describe the bug SpeechRecognizer receives a Canceled event with the ErrorCode = ServiceTimeout and ErrorDetails = "Due to service inactivity, the client buffer exceeded maximum size. Resetting the buffer. SessionId: f5ffa7a174444e25a0baf52a639d545b" and stops processing the audio file
To Reproduce Steps to reproduce the behavior:
Here is a code sample we are using:
Expected behavior An audio file should be fully processed and the recognition should stop only after processing the file fully
Version of the Cognitive Services Speech SDK Version 1.33.0
Platform, Operating System, and Programming Language
Linux version 5.10.16.3-microsoft-standard-WSL2 (x86_64-msft-linux-gcc (GCC) 9.3.0, GNU ld (GNU Binutils) 2.34.0.20200220) #1 SMP Fri Apr 2 22:23:49 UTC 2021
x86_64
C#
Additional context For additional information attaching a log file logfile.txt