Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.97k stars 1.87k forks source link

Service Timeout: Due to service inactivity, the client buffer exceeded maximum size. Resetting the buffer. #1989

Closed Tryptophan closed 1 year ago

Tryptophan commented 1 year ago

Describe the bug When using continuous speech translation there is an intermittent cancellation of the recognizer, stopping translation for that client. This is an intermittent occurrence that seems to happen rarely.

We are passing a real-time stream (using AudioInputStream) to the recognizer, so we're not sure how the buffer is exceeding the maximum size as the rate shouldn't exceed any possible window allocated by the recognizer.

To Reproduce Unknown, no behavior discovered yet that indicates what is required to reproduce consistently. It might be if the translator has been running continuous speech translation for too long, but I have seen instances longer running than the ones that it occurred in.

Expected behavior The client sdk should continue to recognize speech when passing in a real-time audio stream for any length of time.

Version of the Cognitive Services Speech SDK Java 1.28.0

Platform, Operating System, and Programming Language OS: Linux (Debian 11 docker container) x64 Client SDK: Java 1.28.0

Additional context

Region: eastus Session id: 80e2d12dc1c14edf92a87da4fba84707

(Please pm for specific azure project id if needed by azure team)

We unfortunately have not caught this error with full logging enabled, but will try to in the future to update here.

Stacktrace:

java.lang.Exception: Speech translation canceled.
  at com.microsoft.cognitiveservices.speech.util.EventHandlerImpl.fireEvent(Unknown Source)
  at com.microsoft.cognitiveservices.speech.translation.TranslationRecognizer.canceledEventCallback(Unknown Source)
Unexpected error message: [Speech translation canceled.]
Unexpected error details: [Due to service inactivity, the client buffer exceeded maximum size. Resetting the buffer. SessionId: 80e2d12dc1c14edf92a87da4fba84707]

I've seen mentions of ways to manually change the buffer size in the config for the recognizer. What would be a recommended size to try to mitigate this to happen at a lower rate, even if it doesn't 100% fix it?

rhurey commented 1 year ago

Thanks for providing the session ID, it was helpful in starting an investigation.

From a client perspective, the buffer will hold up to about a minute of audio before ending recognition, and filling the buffer (typically) means the service responses are running about a minute behind real-time.

I'm talking with our service team about this session, the telemetry looks atypical but I'm not sure why yet.

rhurey commented 1 year ago

We think we've identified a root cause and are working with the service teams to find the right fix.

In the meantime, there may be a workaround we can try. If you email me @microsoft.com I can share details.

bobir01 commented 1 year ago

I am using azure-cognitiveservices-speech==1.29.0 python3.10 SDK , initially, it was working fine with stt + pronunciation assessment but now even with small files like 12MB in size and length of 1 minute it stucks in this error Note, I also added pronunciation config to my speech_config but it should not be deal

here am attaching my log file my_logs_speechsdk.txt

I created a support ticket on the Azure portal, however no updates at all @pankopon Thanks for considering

in logs I noticed, SDK did some services for a few chunks, then this error happened

rhurey commented 1 year ago

@Tryptophan the service fix should be rolling out now (It actually started earlier this week) and should greatly reduce the possibility of hitting this problem.

@bobir01, your timeout is a different root problem where the service isn't segmenting the phrase as expected.

bobir01 commented 1 year ago

@rhurey thank you for your attention where should I report my problem, could you refer please

rhurey commented 1 year ago

@bobir01, we've got the cause identified and are discussing the right long term solution.

If you could send me an email @microsoft.com (same alias) I may have a temporary solution you can try.

The problem you're running into is a convergence of a very long phrase, pronunciation assessment, and single phrase recognition.

bobir01 commented 1 year ago

@rhurey I got it, so did you mean sending the email to <redacting-email> in pm ? Did I understand it right?

7effrey89 commented 1 year ago

@rhurey I got it, so did you mean sending the email to <redacting-email> in pm ? Did I understand it right?

yes

glecaros commented 1 year ago

Hi @bobir01, you are correct. I removed the email from the message to prevent it from being scrapped.

albseb511 commented 1 year ago

I am not sure if i have a similar problem

buffer filled, pausing addition of binaries until space is made

This is the error that is coming up. This issue comes up on two cases

  1. When an audio is generated using text to speech
  2. When an audio is recognized using speech to text.

Help is appreciated

rhurey commented 1 year ago

@albseb511 could you open a new issue for that? That error string is coming from the speaker output in the JavaScript SDK and it's going to be a different investigation, and we already have two separate issues comingled here.

albseb511 commented 1 year ago

Hey, give me sometime I'll do that. Assumed it was similar. Since it was a bufferissue

albseb511 commented 1 year ago

@albseb511 could you open a new issue for that? That error string is coming from the speaker output in the JavaScript SDK and it's going to be a different investigation, and we already have two separate issues comingled here.

Done.

https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/2046

Tryptophan commented 8 months ago

Hi @rhurey we are seeing this issue again on the Java client, see the error log here:

2024-03-27T02:03:19.328178340Z CANCELED: Reason=Error
2024-03-27T02:03:19.328789335Z CANCELED: ErrorCode=ServiceTimeout
2024-03-27T02:03:19.329073137Z CANCELED: ErrorDetails=Due to service inactivity, the client buffer exceeded maximum size. Resetting the buffer. SessionId: cfb5a08c5b0440e98f8e3ca0e1db9688
2024-03-27T02:03:19.329686404Z java.lang.Exception: Speech translation canceled.
2024-03-27T02:03:19.329900580Z  at lambda$buildRecognizer$4(Translator.java:362)
2024-03-27T02:03:19.330070095Z  at com.microsoft.cognitiveservices.speech.util.EventHandlerImpl.fireEvent(Unknown Source)
2024-03-27T02:03:19.330248685Z  at com.microsoft.cognitiveservices.speech.SpeechRecognizer.canceledEventCallback(Unknown Source)
2024-03-27T02:03:19.331123668Z Unexpected error message: [Speech translation canceled.]
2024-03-27T02:03:19.332238660Z Unexpected error details: [Due to service inactivity, the client buffer exceeded maximum size. Resetting the buffer. SessionId: cfb5a08c5b0440e98f8e3ca0e1db9688]

We seem to only be able to reproduce it when setting the following:

speechConfig.setProperty(PropertyId.Speech_SegmentationSilenceTimeoutMs, "2000");

This seems to be reproduced more on machines with lower memory availability if that's helpful.

We are on version 1.36.0 (latest) on the Java client sdk. This issue probably needs to be re-opened or I can open a new issue as well.

Let me know if you need other details, thank you!

Tryptophan commented 7 months ago

Hi @rhurey @glharper any information on my last comment?