Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.97k stars 1.86k forks source link

How to fix SPXERR_START_RECOGNIZING_INVALID_STATE_TRANSITION in SpeechSDK? #2544

Open TheXRMonk opened 3 months ago

TheXRMonk commented 3 months ago

IN ORDER TO ASSIST YOU, PLEASE PROVIDE THE FOLLOWING:

Microsoft.CognitiveServices.Speech.Internal.SpxExceptionThrower.ThrowIfFail (System.IntPtr hr) (at <00000000000000000000000000000000>:0) Microsoft.CognitiveServices.Speech.Recognizer.StartContinuousRecognition () (at <00000000000000000000000000000000>:0) Microsoft.CognitiveServices.Speech.Recognizer.DoAsyncRecognitionAction (System.Action recoImplAction) (at <00000000000000000000000000000000>:0) System.Threading.Tasks.Task.Execute () (at <00000000000000000000000000000000>:0) System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) (at <00000000000000000000000000000000>:0) System.Threading.Tasks.Task.ExecuteWithThreadLocal (System.Threading.Tasks.Task& currentTaskSlot) (at <00000000000000000000000000000000>:0) System.Threading.Tasks.Task.ExecuteEntry (System.Boolean bPreventDoubleExecution) (at <00000000000000000000000000000000>:0) System.Threading.ThreadPoolWorkQueue.Dispatch () (at <00000000000000000000000000000000>:0) --- End of stack trace from previous location where exception was thrown --- Laerdal.XR.ConversationalAI.AzureSpeechToTextService.StartRecognizing () (at <00000000000000000000000000000000>:0) System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) (at <00000000000000000000000000000000>:0) System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run () (at <00000000000000000000000000000000>:0) System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction (System.Action action, System.Boolean allowInlining, System.Threading.Tasks.Task& currentTask) (at <00000000000000000000000000000000>:0) System.Threading.Tasks.Task.FinishContinuations () (at <00000000000000000000000000000000>:0) System.Threading.Tasks.Task.Finish (System.Boolean bUserDelegateExecuted) (at <00000000000000000000000000000000>:0) System.Threading.Tasks.Task.ExecuteWithThreadLocal (System.Threading.Tasks.Task& currentTaskSlot) (at <00000000000000000000000000000000>:0) System.Threading.Tasks.Task.ExecuteEntry (System.Boolean bPreventDoubleExecution) (at <00000000000000000000000000000000>:0) System.Threading.ThreadPoolWorkQueue.Dispatch () (at <00000000000000000000000000000000>:0) --- End of stack trace from previous location where exception was thrown --- System.Runtime.CompilerServices.AsyncMethodBuilderCore+<>c.b__7_0 (System.Object state) (at <00000000000000000000000000000000>:0) UnityEngine.UnitySynchronizationContext+WorkRequest.Invoke () (at <00000000000000000000000000000000>:0) UnityEngine.UnitySynchronizationContext.Exec () (at <00000000000000000000000000000000>:0)



**Describe the bug**

This is a unity project running on Android (Quest 3). This happens when the device and application have been sleeping for a while.

**To Reproduce**

Send off a few speech requests, which works, then let the device sleep for some time (at least a couple of munites, unsure how long is needed), and try to transcribe again.

**Expected behavior**

Transcription should work without errors.

**Version of the Cognitive Services Speech SDK**

1.38.0

**Platform, Operating System, and Programming Language**

Android, Meta Quest 3
github-actions[bot] commented 2 months ago

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.

jychoudh commented 2 months ago

Sorry for the delay. Can you please share the logs from Speech SDK? You can get the logs by following the steps in this article - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-use-logging#android. Without logs, I am guessing that when your device goes to sleep, it is stopping the microphone and that's why Speech SDK goes into an invalid state after coming back from sleep. I see a question about microphone stopping on Unity's forum - https://discussions.unity.com/t/microphone-bug-on-meta-quest-2-and-3-when-putting-the-headset-to-sleep/1496502.

TheXRMonk commented 2 months ago

Sorry for the delay. Can you please share the logs from Speech SDK? You can get the logs by following the steps in this article - https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-use-logging#android. Without logs, I am guessing that when your device goes to sleep, it is stopping the microphone and that's why Speech SDK goes into an invalid state after coming back from sleep. I see a question about microphone stopping on Unity's forum - https://discussions.unity.com/t/microphone-bug-on-meta-quest-2-and-3-when-putting-the-headset-to-sleep/1496502.

Sorry for the late answer - we're just doing "AudioConfig.FromDefaultMicrophoneInput();" which initiates the microphone on the Meta Quest 3 fine - but also in the editor. I was able to reproduce this issue in the unity editor as well (after implementing functionality to enable and disable the whole functionality based on application focus) So now we completely dispose of the AudioConfig and the Recognizer before setting them up again on application focus.

The issue/error pops up when i unfocus/refocus the gameview and then talk again. It does recognize correctly though - but the errors persist.

ApplicationException: Exception with an error code: 0x13 (SPXERR_START_RECOGNIZING_INVALID_STATE_TRANSITION)
System.ApplicationException: Exception with an error code: 0x13 (SPXERR_START_RECOGNIZING_INVALID_STATE_TRANSITION)
void Microsoft.CognitiveServices.Speech.Internal.SpxExceptionThrower.ThrowIfFail(IntPtr hr)
void Microsoft.CognitiveServices.Speech.Recognizer.StartContinuousRecognition()
void Microsoft.CognitiveServices.Speech.Recognizer.DoAsyncRecognitionAction(Action recoImplAction)
Task Microsoft.CognitiveServices.Speech.SpeechRecognizer.StartContinuousRecognitionAsync()+() => { }
async void Laerdal.XR.ConversationalAI.AzureSpeechToTextService__Patched_.StartRecognizing() (at ../../../Users/DKMUL2/AppData/Local/Temp/905bc593bcff44b19862dee2e9444c88.SourceCodeCombined.cs:278)
void Microsoft.CognitiveServices.Speech.Internal.SpxExceptionThrower.ThrowIfFail(IntPtr hr)
void Microsoft.CognitiveServices.Speech.Recognizer.StartContinuousRecognition()
void Microsoft.CognitiveServices.Speech.Recognizer.DoAsyncRecognitionAction(Action recoImplAction)
Task Microsoft.CognitiveServices.Speech.SpeechRecognizer.StartContinuousRecognitionAsync()+() => { }
async void Laerdal.XR.ConversationalAI.AzureSpeechToTextService__Patched_.StartRecognizing() (at ../../../Users/DKMUL2/AppData/Local/Temp/905bc593bcff44b19862dee2e9444c88.SourceCodeCombined.cs:278)
void System.Runtime.CompilerServices.AsyncMethodBuilderCore.ThrowAsync(Exception exception, SynchronizationContext targetContext)+(object state) => { }
void UnityEngine.UnitySynchronizationContext+WorkRequest.Invoke()
void UnityEngine.UnitySynchronizationContext.Exec()
void UnityEngine.UnitySynchronizationContext.ExecuteTasks()
   at void UnityEngine.DebugLogHandler.Internal_LogException_Injected(Exception, IntPtr)
void UnityEngine.DebugLogHandler.Internal_LogException(Exception ex, Object obj)
void UnityEngine.DebugLogHandler.LogException(Exception exception, Object context)
void Sim3D.PackageHelper.Editor.Utility.LogFilter.LogException(Exception exception, Object context) (at ./Library/PackageCache/com.laerdal.sim3d.dev.package-helper/Editor/Utility/LogFilter.cs:29)
void UnityEngine.Logger.LogException(Exception exception, Object context)
bool UnityEngine.Debug.CallOverridenDebugHandler(Exception exception, Object obj)

Once in a while i get this single time before the other one above:

ApplicationException: Exception with an error code: 0x21 (SPXERR_INVALID_HANDLE)
System.ApplicationException: Exception with an error code: 0x21 (SPXERR_INVALID_HANDLE)
void Microsoft.CognitiveServices.Speech.Internal.SpxExceptionThrower.ThrowIfFail(IntPtr hr)
void Microsoft.CognitiveServices.Speech.Recognizer.StartContinuousRecognition()
void Microsoft.CognitiveServices.Speech.Recognizer.DoAsyncRecognitionAction(Action recoImplAction)
Task Microsoft.CognitiveServices.Speech.SpeechRecognizer.StartContinuousRecognitionAsync()+() => { }
async void Laerdal.XR.ConversationalAI.AzureSpeechToTextService__Patched_.StartRecognizing() (at ../../../Users/DKMUL2/AppData/Local/Temp/7b197769dfea46558e5acf01116ebcc7.SourceCodeCombined.cs:283)
void Microsoft.CognitiveServices.Speech.Internal.SpxExceptionThrower.ThrowIfFail(IntPtr hr)
void Microsoft.CognitiveServices.Speech.Recognizer.StartContinuousRecognition()
void Microsoft.CognitiveServices.Speech.Recognizer.DoAsyncRecognitionAction(Action recoImplAction)
Task Microsoft.CognitiveServices.Speech.SpeechRecognizer.StartContinuousRecognitionAsync()+() => { }
async void Laerdal.XR.ConversationalAI.AzureSpeechToTextService__Patched_.StartRecognizing() (at ../../../Users/DKMUL2/AppData/Local/Temp/7b197769dfea46558e5acf01116ebcc7.SourceCodeCombined.cs:283)
void System.Runtime.CompilerServices.AsyncMethodBuilderCore.ThrowAsync(Exception exception, SynchronizationContext targetContext)+(object state) => { }
void UnityEngine.UnitySynchronizationContext+WorkRequest.Invoke()
void UnityEngine.UnitySynchronizationContext.Exec()
void UnityEngine.UnitySynchronizationContext.ExecuteTasks()
   at void UnityEngine.DebugLogHandler.Internal_LogException_Injected(Exception, IntPtr)
void UnityEngine.DebugLogHandler.Internal_LogException(Exception ex, Object obj)
void UnityEngine.DebugLogHandler.LogException(Exception exception, Object context)
void Sim3D.PackageHelper.Editor.Utility.LogFilter.LogException(Exception exception, Object context) (at ./Library/PackageCache/com.laerdal.sim3d.dev.package-helper/Editor/Utility/LogFilter.cs:29)
void UnityEngine.Logger.LogException(Exception exception, Object context)
bool UnityEngine.Debug.CallOverridenDebugHandler(Exception exception, Object obj)

and this:

ApplicationException: Exception with an error code: 0x5 (SPXERR_INVALID_ARG)
System.ApplicationException: Exception with an error code: 0x5 (SPXERR_INVALID_ARG)
void Microsoft.CognitiveServices.Speech.Internal.SpxExceptionThrower.ThrowIfFail(IntPtr hr)
void Microsoft.CognitiveServices.Speech.Recognizer.StartContinuousRecognition()
void Microsoft.CognitiveServices.Speech.Recognizer.DoAsyncRecognitionAction(Action recoImplAction)
Task Microsoft.CognitiveServices.Speech.SpeechRecognizer.StartContinuousRecognitionAsync()+() => { }
async void Laerdal.XR.ConversationalAI.AzureSpeechToTextService__Patched_.StartRecognizing() (at ../../../Users/DKMUL2/AppData/Local/Temp/7b197769dfea46558e5acf01116ebcc7.SourceCodeCombined.cs:283)
void Microsoft.CognitiveServices.Speech.Internal.SpxExceptionThrower.ThrowIfFail(IntPtr hr)
void Microsoft.CognitiveServices.Speech.Recognizer.StartContinuousRecognition()
void Microsoft.CognitiveServices.Speech.Recognizer.DoAsyncRecognitionAction(Action recoImplAction)
Task Microsoft.CognitiveServices.Speech.SpeechRecognizer.StartContinuousRecognitionAsync()+() => { }
async void Laerdal.XR.ConversationalAI.AzureSpeechToTextService__Patched_.StartRecognizing() (at ../../../Users/DKMUL2/AppData/Local/Temp/7b197769dfea46558e5acf01116ebcc7.SourceCodeCombined.cs:283)
void System.Runtime.CompilerServices.AsyncMethodBuilderCore.ThrowAsync(Exception exception, SynchronizationContext targetContext)+(object state) => { }
void UnityEngine.UnitySynchronizationContext+WorkRequest.Invoke()
void UnityEngine.UnitySynchronizationContext.Exec()
void UnityEngine.UnitySynchronizationContext.ExecuteTasks()
   at void UnityEngine.DebugLogHandler.Internal_LogException_Injected(Exception, IntPtr)
void UnityEngine.DebugLogHandler.Internal_LogException(Exception ex, Object obj)
void UnityEngine.DebugLogHandler.LogException(Exception exception, Object context)
void Sim3D.PackageHelper.Editor.Utility.LogFilter.LogException(Exception exception, Object context) (at ./Library/PackageCache/com.laerdal.sim3d.dev.package-helper/Editor/Utility/LogFilter.cs:29)
void UnityEngine.Logger.LogException(Exception exception, Object context)
bool UnityEngine.Debug.CallOverridenDebugHandler(Exception exception, Object obj)
lvialle commented 1 month ago

I confirm that I have the same issue using Unity on iOS and Android. Just tried the 1.40 SDK with same result.

TheXRMonk commented 1 month ago

It seems like using Unity's OnApplicationFocus to clean up and dispose "everything" when app focus is lost, and then reinstantiate when focus has been regained have solved our problem. We need more thorough manual tests to completely confirm though.