Async Functionality in Cordova

esgraham commented 4 years ago

Description

As an architect, I want to understand how Cordova implements Async functionality, so that I can determine if plugin should implement the Speech SDK async functions.

Acceptance Criteria

Find document that details the how Cordova works with async functionality so that we can review to make a decision.
Find document that details how the Cognitive Services Speech SDK implements async servcies so that we can review to make a decision on whether or not to use the functions in the plugin.
Update README.md with final decision of if and how to implement async functions from the Speech SDK

rozele commented 4 years ago

Android

Recognize from microphone

Recommendation: recognizeOnceAsync

Cordova plugins are invoked on the WebCore thread on Android, not the main UI thread, see the threading section of this Android overview.
The Java Speech SDK only has a recognizeOnceAsync method that returns a Future<T>.
Calling the get() method on the future will block the current thread.
We don't want to block the WebCore thread, so we should create an ExecutorService to get the results from the Future on a background thread. The ExecutorService used in the Cognitive Services samples is the cached thread pool (Executors.newCachedThreadPool()).

Stop recognizing from microphone

Recommendation: stopContinuousRecognitionAsync

Similar to the other methods, leverage the ExecutorService set up for speech recognition to "await" the get() call on the Future returned by stopContinuousRecognitionAsync.

Play text to speech audio

Recommendation: SpeakTextAsync and SpeakSsmlAsync

The Java Speech SDK has 8 methods for speaking text:
- Return before audio played vs. return after audio completed
- Async vs. sync
- Text vs. SSML
We should use the async variants that return after audio is completed and expose one for both text and SSML. We can revisit at a later time if we should expose the variant that returns before the audio is played.
We should leverage the same ExecutorService that we set up for speech recognition for resolving the Future.

Stop playing text to speech audio

Recommendation: AudioTrack

TODO: We need to investigate if calling cancel(true) on the Future<T> returned by the SpeakTextAsync method will stop playing any audio.
~~If it does not, we'll need to set up a custom playback mechanism similar to how it was done for iOS.~~
We need to set up audio playback to support cancellation on Android.

iOS

Recognize from microphone

Recommendation: recognizeOnce

Cordova plugins run on the main UI thread, so long running tasks should be invoked on a background thread (see this iOS overview).
Using either recognizeOnceAsync or recognizeOnce seems to block the calling thread until speech recognition is complete, so the simplest option is to just use recognizeOnce and run it on a background thread.

Stop recognizing from microphone

Recommendation: stopContinuousRecognition

stopContinuousRecognition will block the calling thread until complete. I believe this is desired behavior, so we may not need to switch this call onto a background thread, but it may lead to a poor UX if the stop / cancel operation takes too long.

Play text to speech audio

Recommendation: speakText and speakSsml

iOS does not have an async variant for these methods. Same recommendation as for Android.

Stop playing text to speech audio

Recommendation: AVAudioPlayer.stop

There currently is no cancellation option that can be invoked from the Speech SDK.
We have experimented with AVAudioPlayer, on which we can call the stop method to cancel any active audio.
It may be fine to do this from the main thread.

rozele commented 4 years ago

A minor modification to the above, I actually recommend we do not use the Speech SDK for text-to-speech playback, because it does not support cancellation. Using the Speech SDK to get the audio data to play on something like AVAudioPlayer (iOS) or AudioTrack (Android) is not as efficient as piping the bytes directly from a REST call to Cognitive Services.

I believe we can efficiently use the Speech SDK to stream audio, if we leverage the PushAudioOutputStream and direct the bytes written to the output stream directly to the AVAudioPlayer / AudioTrack.

SpeechConfig speechConfig = SpeechConfig.fromSubscription(...);
CustomPushAudioOutputStreamCallback callback = new CustomPushAudioOutputStreamCallback();
PushAudioOutputStream outputStream = PushAudioOutputStream.create(callback);
AudioConfig audioConfig = AudioConfig.fromStreamOutput(outputStream);
SpeechSynthesizer synthesizer = new SpeechSynthesizer(speechConfig, audioConfig);

CatalystCode / cordova-plugin-cogsvcsspeech