csdcorp / speech_to_text

A Flutter plugin that exposes device specific text to speech recognition capability.
BSD 3-Clause "New" or "Revised" License
373 stars 232 forks source link

[iOS] STT plugin seems to hold onto audio when complete and stopped #373

Closed ChrisMICDUP closed 1 year ago

ChrisMICDUP commented 1 year ago

Describe the bug My app listens to an Audio Visual stream from a cloud AV service (Agora). While listening it allows you to record a message on the phone (using flutter_sound), simultaneously does STT (using speech_to_text) and sends the recognized results and audio recording to the cloud.

Before the recording starts we mute all remote audio on the AV stream. After the recording is complete, we post the results and unmute the remote audio to continue to listen to the live stream. When we unmute, there is an unusual 10 second delay before remote audio can be heard again. This appears to be Agora retrying some audio filter method, failing, exponentially backing off, before eventually trashing their audio component and starting again, whereupon it succeeds.

If I remove any calls to the STT plugin the delay is in 10th of seconds as expected.

I note that the SST plugin error: "[plugin] Error deactivation: Session deactivation failed" is displayed in the flutter console when we call SpeechToText.stop(). This seems to come from the call self.audioSession.setActive(false, options: .notifyOthersOnDeactivation) in SwiftSpeechToTextPlugin.swift. I am unable to determine what would cause this after reviewing the iOS swift AVAudioSession documentation.

This has been working before (with minimal delay) with older versions of speech_to_text, Agora and flutter_sound.

Do you have any insight into why we would get the Session deactivation failed message?

Smartphone (please complete the following information):

Additional context initialize params

```

debugPrint('${DateTime.now()} XXXXX STT _speechToText.initialize'); return initialized = await _speechToText.initialize( onError: _errorListener, debugLogging: true // TODO: turn off at some point );


listen params
await _speechToText.listen(
    onResult: (SpeechRecognitionResult result){
      debugPrint('XXXXX STT onResult lastError = ${_speechToText.lastError.toString()} lastStatus = ${_speechToText.lastStatus}');
      if (result.finalResult) {
        _completer?.complete(Future.value(result.recognizedWords));
      }
    },
    listenFor: Duration(milliseconds: 90000), 
    pauseFor: Duration(seconds: 5),
    partialResults: false,
    cancelOnError: true,
    listenMode: ListenMode.confirmation
);


**_flutter console_**

References to RTC and iris are Agora calls, STT are speech_to_text

flutter: 2023-03-08 13:45:54.302986 XXXXX RTCEngineProviderState _remoteAudioStateChanged reason=RemoteAudioStateReason.remoteAudioReasonRemoteUnmuted uid=2596996162 state=RemoteAudioState.remoteAudioStateStarting
flutter: RTCEngineProviderState audioRouteChanged 3
flutter: 2023-03-08 13:45:54.702418 XXXXX RTCEngineProviderState _remoteAudioStateChanged reason=RemoteAudioStateReason.remoteAudioReasonRemoteUnmuted uid=2596996162 state=RemoteAudioState.remoteAudioStateDecoding
flutter: 2023-03-08 13:45:57.000534 XXXXX startRecording muteAllRemoteAudioStreams true
[debug] [iris_rtc_engine_impl.cc:114] api name RtcEngine_muteAllRemoteAudioStreams params {"mute":true}
flutter: 2023-03-08 13:45:57.003061 XXXXX start
flutter: 2023-03-08 13:45:57.003587 XXXXX STT _speechToText.initialize
[plugin] Has permissions continuing with setup
[debug] [iris_rtc_engine_impl.cc:132] ret 0 result {"result":0}
Required assets are not available for Locale:en_NZ
flutter: 2023-03-08 13:45:57.065825 XXXXX STT listen
flutter: 2023-03-08 13:45:57.103111 XXXXX RTCEngineProviderState _remoteAudioStateChanged reason=RemoteAudioStateReason.remoteAudioReasonLocalMuted uid=2596996162 state=RemoteAudioState.remoteAudioStateStopped
[plugin] invokeFlutter notifyStatus
flutter: RecordButton _speechToText.start state=true
[plugin] invokeFlutter soundLevelChange
[plugin] invokeFlutter soundLevelChange
[plugin] invokeFlutter soundLevelChange
[plugin] invokeFlutter soundLevelChange
[plugin] invokeFlutter soundLevelChange
[plugin] invokeFlutter soundLevelChange
[plugin] invokeFlutter soundLevelChange
[plugin] invokeFlutter soundLevelChange
[plugin] invokeFlutter soundLevelChange
[plugin] HypothesizeTranscription
[plugin] Encoded JSON result: {"alternates":[{"recognizedWords":"Want","confidence":1}],"finalResult":false}
[plugin] invokeFlutter textRecognition
[plugin] invokeFlutter soundLevelChange
[plugin] invokeFlutter soundLevelChange
[plugin] HypothesizeTranscription
[plugin] Encoded JSON result: {"alternates":[{"recognizedWords":"Want to","confidence":1}],"finalResult":false}
[plugin] invokeFlutter textRecognition
[plugin] invokeFlutter soundLevelChange
[plugin] invokeFlutter soundLevelChange
[plugin] HypothesizeTranscription
[plugin] Encoded JSON result: {"alternates":[{"recognizedWords":"Want two","confidence":1}],"finalResult":false}
[plugin] invokeFlutter textRecognition
[plugin] invokeFlutter soundLevelChange
[plugin] HypothesizeTranscription
[plugin] Encoded JSON result: {"alternates":[{"recognizedWords":"12","confidence":1}],"finalResult":false}
[plugin] invokeFlutter textRecognition
[plugin] invokeFlutter soundLevelChange
[plugin] HypothesizeTranscription
[plugin] Encoded JSON result: {"alternates":[{"recognizedWords":"123","confidence":1}],"finalResult":false}
[plugin] invokeFlutter textRecognition
[plugin] invokeFlutter soundLevelChange
[plugin] invokeFlutter soundLevelChange
[plugin] HypothesizeTranscription
[plugin] Encoded JSON result: {"alternates":[{"recognizedWords":"1234","confidence":1}],"finalResult":false}
[plugin] invokeFlutter textRecognition
[plugin] invokeFlutter soundLevelChange
[plugin] invokeFlutter soundLevelChange
[plugin] invokeFlutter soundLevelChange
[plugin] invokeFlutter soundLevelChange
[plugin] HypothesizeTranscription
[plugin] Encoded JSON result: {"alternates":[{"recognizedWords":"12345","confidence":1}],"finalResult":false}
[plugin] invokeFlutter textRecognition
[plugin] invokeFlutter soundLevelChange
[plugin] invokeFlutter soundLevelChange
flutter: RecordButton stop _speechToText
flutter: 2023-03-08 13:45:59.962943 XXXXX STT stop
[plugin] Error deactivation: Session deactivation failed
[plugin] invokeFlutter notifyStatus
[plugin] Finished reading audio
[plugin] invokeFlutter notifyStatus
flutter: stop and close recorder
[plugin] HypothesizeTranscription
[plugin] Encoded JSON result: {"alternates":[{"recognizedWords":"123456","confidence":1}],"finalResult":false}
[plugin] invokeFlutter textRecognition
[plugin] FinishRecognition true
[plugin] Encoded JSON result: {"alternates":[{"recognizedWords":"123456","confidence":0.964}],"finalResult":true}
[plugin] invokeFlutter textRecognition
[plugin] FinishSuccessfully
flutter: XXXXX STT onResult lastError = null lastStatus = notListening
[plugin] Error deactivation: Session deactivation failed
[plugin] invokeFlutter notifyStatus
flutter: 2023-03-08 13:46:02.366201 XXXXX _stopRecording: before closeAudioSession
[debug] [iris_rtc_engine_impl.cc:114] api name RtcEngine_muteAllRemoteAudioStreams params {"mute":false}
[debug] [iris_rtc_engine_impl.cc:132] ret 0 result {"result":0}
flutter: 2023-03-08 13:46:02.444048 XXXXX _stopRecording after muteAllRemoteAudioStreams false
flutter: 2023-03-08 13:46:02.644264 XXXXX RTCEngineProviderState _remoteAudioStateChanged reason=RemoteAudioStateReason.remoteAudioReasonLocalUnmuted uid=2596996162 state=RemoteAudioState.remoteAudioStateStarting
flutter: Making a new message...

10 second delay

flutter: 2023-03-08 13:46:13.844329 XXXXX RTCEngineProviderState _remoteAudioStateChanged reason=RemoteAudioStateReason.remoteAudioReasonLocalUnmuted uid=2596996162 state=RemoteAudioState.remoteAudioStateDecoding
sowens-csd commented 1 year ago

I think you definitely qualify as a sound 'power user' with this post. That may be the most sound / speech intensive set of interactions I've seen yet, impressed. I'm trying to get past my surprise and wonder that it's working at all and focus on the 10 second delay.

It sounds like something, possibly STT, is taking its time releasing the audio connection. Can you give me a description or pseudocode of what happens in STT and Agora right at that 10 second pause. I'd like to understand what the interaction is between flutter_sound, Agora and STT right at that handover stage. When do you stop recording with flutter_sound, when do you stop listening with STT and when do you start the Agora stream again.

If you start a second STT session immediately after stopping a session is there also a ten second delay in starting to listen or does that happen quickly?

I'll have a look at possible causes for that session deactivation failure.

ChrisMICDUP commented 1 year ago

Thanks a lot for your swift response. Our last release required a bunch of package upgrades, mainly to deal with bluetooth headset issues, too many really... which seems to have lead to this issue.

I'll send the information you've asked for later on today (I'm on NZ time) I'm working on the assumption that we're calling the STT stop too soon in the process and that is affecting the session deactivation. I've added a StatusListener to the STT and might do the stop and unmute based on that.

Thanks again (Regarding the power user comment, not only are these interactions working, but the app has been in production for a year and a half...)

ChrisMICDUP commented 1 year ago

Additional Information:

startRecording:

stopRecording:

And this is how the Start STT is implemented

  Future<bool> start() async{
    if (!(initialized || await _initialize())){
      return false;
    }
    _completer = Completer();
    debugPrint('${DateTime.now()} XXXXX STT listen');
    await _speechToText.listen(
        onResult: (SpeechRecognitionResult result) {
          debugPrint('XXXXX STT onResult lastError = ${_speechToText.lastError.toString()} lastStatus = ${_speechToText.lastStatus}');
          if (result.finalResult) {
            _completer?.complete(Future.value(result.recognizedWords));
          }
        },
        listenFor: Duration(milliseconds: 90000), 
        pauseFor: Duration(seconds: 10),
        partialResults: false,
        cancelOnError: true,
        listenMode: ListenMode.confirmation // TODO: change to ListenMode.dictation??
    );
    return true;
  }

Incidentally, I changed that pauseFor from 10 to 3 with no effect...

I also ran two STT sessions. After one completed, I waited 10 seconds until I could hear the broadcast then created another message, still the same lengthy delay.

Lastly, I moved the Agora mute to the status listener to be executed when status is 'done', but the delay remains. See Log below:

[plugin] invokeFlutter textRecognition
flutter: 2023-03-09 14:29:47.655826 XXXXX _statusListener speechToText status: notListening
[plugin] FinishRecognition true
[plugin] Encoded JSON result: {"alternates":[{"recognizedWords":"54321","confidence":0.971}],"finalResult":true}
[plugin] invokeFlutter textRecognition
[plugin] FinishSuccessfully
flutter: XXXXX STT onResult lastError = null lastStatus = notListening
flutter: 2023-03-09 14:29:47.660501 XXXXX _statusListener speechToText status: done
flutter: 2023-03-09 14:29:47.660618 XXXXX _statusListener status = done muteAllRemoteAudioStreams false
[debug] [iris_rtc_engine_impl.cc:114] api name RtcEngine_muteAllRemoteAudioStreams params {"mute":false}
flutter: 2023-03-09 14:29:47.661525 XXXXX _completed, send message: 54321
[plugin] invokeFlutter notifyStatus
flutter: 2023-03-09 14:29:47.736883 XXXXX _statusListener speechToText status: done
flutter: 2023-03-09 14:29:47.737020 XXXXX _statusListener status = done muteAllRemoteAudioStreams false
[debug] [iris_rtc_engine_impl.cc:132] ret 0 result {"result":0}
[debug] [iris_rtc_engine_impl.cc:114] api name RtcEngine_muteAllRemoteAudioStreams params {"mute":false}
[debug] [iris_rtc_engine_impl.cc:132] ret 0 result {"result":0}
flutter: 2023-03-09 14:29:47.831687 XXXXX RTCEngineProviderState _remoteAudioStateChanged reason=RemoteAudioStateReason.remoteAudioReasonLocalUnmuted uid=2377840510 state=RemoteAudioState.remoteAudioStateStarting
flutter: Message Add a new message...
>>>>> 12 second delay
flutter: 2023-03-09 14:29:59.431119 XXXXX RTCEngineProviderState _remoteAudioStateChanged reason=RemoteAudioStateReason.remoteAudioReasonLocalUnmuted uid=2377840510 state=RemoteAudioState.remoteAudioStateDecoding

I also added a small sleep prior to calling the mute to no effect (apart from the delay being longer...)

There's no kill switch or config options that may help?

I've raised an issue with the Agora folk who are usually quite responsive, so will see what happens there. Otherwise I'll remove the STT and use the GCP service I'm using for Android.

ChrisMICDUP commented 1 year ago

I have found a work around which is to disable and enable the entire Agora audio module, it's not exactly recommended, but I think for my use case it will suffice. Thanks for your help again.

sowens-csd commented 1 year ago

Thanks for the updates, that's really helpful. The session deactivation failure looks interesting as a possible cause. After looking at your sequence I'm hung up on the word 'mute'. I'm wondering if Agora still owns the audio session while muted and that's causing contention. Your finding that disabling / enabling the Agora audio module mitigates the problem also suggests that might be true. Could you ask Agora about that?

The session deactivation error that you're seeing happens when STT is trying to set the shared audio session to inactive. This failure usually means that the audio session is still in use while it is being set to inactive. I've had conflicts with other audio tools before and have tried to make STT a good citizen in terms of using shared iOS audio resources but I'm not confident that I've found all issues. I'm slowly discovering that confidence and iOS sound are mutually exclusive.

Are you using sound playback at the beginning / end of the speech recognition session? I assume not given all the other sound that's happening, but if you are you should try disabling that. You can do it by removing the sounds from the the assets section of the pubspec file.

ChrisMICDUP commented 1 year ago

Your assumption that I don't play a start and stop beep flies in the face of my newly anointed "power user" status. Of course I do! I actually use flutter_sound for that (for reasons in the past that escape me, probably didn't RTFM).

I removed them, put back in the Agora mute, but same behaviour unfortunately.

sowens-csd commented 1 year ago

lol, so glad you're taking advantage of every single way of playing and listening for sound in your application. I apologize for doubting you.

Any feedback from Agora on whether they're releasing the audio session on mute? Or have you decided that your current work around is good enough and you're going to leave it there?

ChrisMICDUP commented 1 year ago

Nothing from Agora as yet, but we've completed testing with my current fix and it will be good enough. If I hear anything I'll let you know.

sowens-csd commented 1 year ago

I'm going to close this then. Please let me know if you find anything else relevant.