csdcorp / speech_to_text

A Flutter plugin that exposes device specific text to speech recognition capability.
BSD 3-Clause "New" or "Revised" License
351 stars 218 forks source link

Potential interference with other plugin #245

Closed tamoyal closed 2 years ago

tamoyal commented 2 years ago

I started using this to play audio - https://github.com/Canardoux/tau

After using the speech recognition once, the volume goes super low and there is no bringing it back up. It's probably the other plugin but I wanted to ask if you could see any way this could interfere with another audio plugin. The other reason I'm asking is for some reason the speech recognition became less reliable since I started using the plugin. I get this a lot:

FinishSuccessfully with error: Optional(Error Domain=EARErrorDomain Code=3 "Recognition was unsucessful" UserInfo={NSLocalizedDescription=Recognition was unsucessful})

All good if you say it's most likely the other plugin but let me know if anything comes to mind as potentially conflicting.

Thanks!

sowens-csd commented 2 years ago

Are you seeing the issues on iOS, Android or both? If on iOS are you using start/stop sounds or not? There's definitely room for the plugins to conflict as they both access the audio subsystem. Nothing comes to mind immediately on the volume issue. In one of my apps I'm using this plug https://pub.dev/packages/audioplayers and haven't seen those kind of issues.

My first general thought on conflicts is always that one of the two plugins isn't properly closing out their use of the platform audio components. When you get that log message do you also get no results from the speech recognition listen session? If you stop using the audio plugin you don't see those log messages? Do they only happen when you've played audio before using speech recognition or just by having the audio plugin installed?

tamoyal commented 2 years ago

Great questions @sowens-csd

I have only tested this on iOS and yes, I am using start/stop sounds. The maintainer of the plugin did mention there is an issue around volume control that he will fix in the next few days so probably best to let him address that before we dig into this more.

However, it's funny mention audioplayers because the reason I wanted to try out Tau is we are getting tons of very hard to reproduce crashes in production using that plugin and unpredictable state issues - see this and this but most importantly this. The first one seems like an obvious logical bug on their end and I may just dig into it myself. The second one is maybe failing a stress test but also maybe indicates some bad cleanup that could conflict with other plugins. The third one happens when I play a sound right after the speech recognizer does its thing because we use the speech recognizer for a quiz (so the user gets a "right" or "wrong" sound after they answer).

Is it possible this plugin doesn't do some sort of cleanup well (or I need to wait for some callback) and there could be an issue there? I see you asked some other questions above but I'm not sure if they are relevant for this specific issue. Also not sure how proprietary your code is but we could share some code and compare usage with audioplayers and the speechrecognizer if you are using those two plugins together.

tamoyal commented 2 years ago

One thing I do see happening, is if you start the speech recognizer while other audio is playing, you will get this error AVAudioSession_iOS.mm:1206 Deactivating an audio session that has running I/O. All I/O should be stopped or paused prior to deactivating the audio session. almost always the speech recognizer will sometimes work, sometimes not. I'm not sure what the "right" thing to do here is... maybe if the audio engine is "busy", returning an error and denying the starting of the speech recognizer would be ideal. Or maybe the interference is the start sound and the playing audio in which case another option would be to just skip the start sound. And then there is always the option of telling the user they have to manage this in which case it just goes in the docs somewhere.

tamoyal commented 2 years ago

What surprises me is that these can conflict. The audio plugins I have seen, including audioplayers allows you to set a player ID which implies having multiple players and playing multiple sounds simultaneously should be no problem. And I certainly have been able to play 2 sounds at the same time. So I would expect these plugins to be hold their own player instances and not interfere. Let me know if you have any thoughts on that point ....thanks!

tamoyal commented 2 years ago

@sowens-csd One idea ... use mixWithOthers - https://developer.apple.com/documentation/avfaudio/avaudiosession/categoryoptions/1616611-mixwithothers

sowens-csd commented 2 years ago

I think that the issue you mentioned above about the log when it tries to deactivate the audio session is the same as #241 so I'll track the work on that there.

My impression after reading the mixWithOthers doc is that only affects cross application mixing, not within a single app. I don't think that will help in this case. I think the point of conflict is the AVAudioSession class which is an application level singleton. In particular I do a lot of cleanup on this class because of some subtle issues with ensuring that speech recognition can successfully get a tap on the audio input.

My use of sound playback and audio recognition isn't intensive enough to see these kinds of issues come up. There is usually minutes of real world time separating one from the other so there's plenty of time to stabilize the state of the audio before another plugin needs it.

My current thought is that the issue is related to the use of the start / stop sounds. On iOS when you stop the listen it plays the stop sound and only after that sound is played does it then cleanup the audio session. That cleanup is done on a callback which means that it is asynchronous with regards to the other audio player. So the audio session could be mutated by two separate threads resulting in an unknown state. If you could test without those sounds I'd be interested to know if you're still seeing issues. As part of #241 I'm looking at how I can make the cleanup process faster and more determinate so hopefully that will help these issues as well.

tamoyal commented 2 years ago

@sowens-csd Yep that all sounds great. For now I'm going to wait a second after stopping the speech recognizer before I let other audio play. I guess I have a weird use case!

sowens-csd commented 2 years ago

I just committed a change to the repo that may help. I've added a new status, done, for the onStatus callback. This status is only sent once the speech recognizer has shut down all of its use of the audio session. This new status always comes after the existing notListening status. I'm thinking that if you wait for that done status before starting to play other sounds that it should work better. If you have a chance to try it please let me know.

tamoyal commented 2 years ago

@sowens-csd Not sure if you saw my message before but something was flawed with my test (so I deleted the message). It looks like this is working. I'll keep an eye on it and let you know if I see issues with the cleanup as I continue to test. Thank you!

sowens-csd commented 2 years ago

Thanks @tamoyal that's really helpful. I did see your previous message and was just preparing myself to be sad about that but then immediately read your follow up. I hope that you continue to see good results. I'm just doing some updates to the web support to add this behaviour.

edwinyoo44 commented 2 years ago

Have you checked whether the sound comes from the speaker or the earpiece?

When I use the TTS plug-in, the sound will be played from the earpiece, resulting in a very low sound problem, and I set this on the TTS plug-in setting to solve the problem.

if (Platform.isIOS) {

  await flutterTts
      .setIosAudioCategory(IosTextToSpeechAudioCategory.playback, [
    IosTextToSpeechAudioCategoryOptions.allowBluetooth,
    IosTextToSpeechAudioCategoryOptions.allowBluetoothA2DP,
    IosTextToSpeechAudioCategoryOptions.mixWithOthers,
    IosTextToSpeechAudioCategoryOptions.defaultToSpeaker
  ]);

}
sowens-csd commented 2 years ago

These changes are available now in 5.0.0.