Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.85k stars 1.84k forks source link

Major delays on speech recognition updates as captions get long enough. #800

Closed benlifson closed 3 years ago

benlifson commented 4 years ago

Describe the bug In AltSpace, you can enable Speech Recognition and Captions in order to see captions based on what you, or another player in the scene, is saying. We have found that when Altspace speech captions get long enough, there are progressively longer delays between updates to the text. In some cases, this can be nearly 5 or more seconds. This causes snowballing delays, as the captions struggle to catch up.

Theory of Cause After some debugging, it is theorized that the probable source of the delays was in the SpeechThread.cs script. That script has a TranslationRecognizer variable called translator. It in turn has two EventHandlers called Recognizing and Recognized. Their invocation are critical to updating the captions. For some reason, they are being invoked less frequently when there is a lot of text in the sentences its keeping track of. Unfortunately, TranslationRecognizer is a sealed class, so further investigation could not be made.

To Reproduce Steps to reproduce the behavior:

  1. Enter Altspace, and enable Speech Recognition and Captions
  2. Have a second user enter Altspace, and enable Speech Recognition and Captions.
  3. Ensure both users are unmuted. Have one user go to the location of the other.
  4. Have user (or users) talk quickly in long, run-on sentences. Since captions only cleanup captions from a bubble based on how many sentences are completed, this will result in very large amounts of text per bubble.
  5. Eventually, the updates to the text will become progressively less frequent. Eventually, the captions will stop updating all together for several seconds. The bubble may even close, only to re-open.

Expected behavior It is expected that player captions, both sender and receiver, will update several times per second as the speaker continues to speak. This should happen regularly, regardless of the length of the sentence.

Version of the Cognitive Services Speech SDK Version 1.12.0

Platform, Operating System, and Programming Language

Additional context

BrianMouncer commented 4 years ago

@benlifson

I do not seem to have access to

https://github.com/AltspaceVR/UnityClient/blob/dev/Assets/Altspace/Scripts/Captions/SpeechThread.cs

Or even see the UnityClient project repo to be able to request access.

How many languages are you translating into at a time, and what languages are you using? I've seen final recognition results take some time if translating into may languages at one time. Can you give a standalone repo of app for this, or at grant me access to the code you linked above, so I can see if I spot anything wrong with the the code?

Also, are you subscribing to the "recognizing" event, or only the final "recognized" event. the recognizing events will fire far more often, where the recognized event can have some delay depending on how long the user talks, and when they pause long enough to cause the service to endpoint and send recognized result.

In general is is good the use the recognizing event to should progress, but the quality of the text will not be as good, then when a recognized event comes in, to go back and update the caption built from the "recognizing" event with the more accurate text from the recognized event.

I hope that helps,

Brian.

benlifson commented 4 years ago

@BrianMouncer Sorry for the delay. I will add you to the teams chat we are having regarding this bug.

jhakulin commented 4 years ago

@benlifson Based on the log you provided, there is translations to en, de, es, fr, it, ja, ko, pt languages at the same time. Could you check if using one translation e.g. from en to de, would impact to delays you are seeing?

benlifson commented 4 years ago

Thank you. Unfortunately, my apartment has lost internet. I am responding on my laptop, but my git work is on my desktop, and laptop has trouble with ALtSPace. I will not be able to run any tests until my internet returns.

From: Jarno Hakulinen notifications@github.com Sent: Monday, September 28, 2020 5:39 PM To: Azure-Samples/cognitive-services-speech-sdk cognitive-services-speech-sdk@noreply.github.com Cc: Ben Lifson Ben.Lifson@microsoft.com; Mention mention@noreply.github.com Subject: Re: [Azure-Samples/cognitive-services-speech-sdk] Major delays on speech recognition updates as captions get long enough. (#800)

@benlifsonhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbenlifson&data=02%7C01%7CBen.Lifson%40microsoft.com%7Cba6ad7a01bc149acc7c608d86410121a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637369367473367248&sdata=6FfphiF3bdGCKkS6P4LufNtQ6JoTrZk0ZpOf39Gz3Ck%3D&reserved=0 Based on the log you provided, there is translations to en, de, es, fr, it, ja, ko, pt languages at the same time. Could you check if using one translation e.g. from en to de, would impact to delays you are seeing?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FAzure-Samples%2Fcognitive-services-speech-sdk%2Fissues%2F800%23issuecomment-700356394&data=02%7C01%7CBen.Lifson%40microsoft.com%7Cba6ad7a01bc149acc7c608d86410121a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637369367473367248&sdata=1u8LU80CekeIpNYhEOln8wbYus8P4zlAZP1NiOm5mQc%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAOIVUYPJ4YON7R6TJZZA773SIEUCTANCNFSM4RPFWZPQ&data=02%7C01%7CBen.Lifson%40microsoft.com%7Cba6ad7a01bc149acc7c608d86410121a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637369367473377243&sdata=P5eRsyCW%2FmVpmXi9enQCcyyPVpFGzDVhuzHJThzQH20%3D&reserved=0.

pankopon commented 3 years ago

@benlifson Is this still a valid issue, can you please test the suggested setup (i.e. translation only from en to de, for example) for comparison?

pankopon commented 3 years ago

@BrianMouncer Please follow up on this with @in2dair and update as necessary.

jhakulin commented 3 years ago

@benlifson We have been able to reproduce the issue and created internal ticket to find resolution. Closing this item.