feat: Add support for word-level progress tracking in TextToSpeech

Dhruv-1105 commented 2 months ago

Is your feature request related to a problem? Please describe: In many applications that use text-to-speech (TTS), it is essential to track the progress of spoken words to provide features such as synchronized text highlighting. Currently, the @capacitor-community/text-to-speech package does not offer a way to get real-time updates on the specific words being spoken, which limits its utility in such scenarios.

Describe the solution you'd like: I propose adding support for an onRangeStart event that emits the start and end indices of the currently spoken word, along with the spoken word itself. This feature would allow developers to track which word is being spoken in real-time and implement functionalities such as synchronized text highlighting. The implementation involves the following changes:

TextToSpeech.java: Added an UtteranceProgressListener that listens for onRangeStart events and emits the start and end indices of the spoken word.

@Override
public void onRangeStart(String utteranceId, int start, int end, int frame) {
    String spokenWord = text.substring(start, end);
    Log.d("TTS", "Spoken word: " + spokenWord);
    resultCallback.onRangeStart(start, end);
}

TextToSpeechPlugin.java: Added a method to handle the onRangeStart callback and emit the event.

@PluginMethod
public void speak(PluginCall call) {
    // existing code...
    SpeakResultCallback resultCallback = new SpeakResultCallback() {
        @Override
        public void onRangeStart(int start, int end) {
            JSObject ret = new JSObject();
            ret.put("start", start);
            ret.put("end", end);
            call.resolve(ret);
        }
    };
    // existing code...
}

definitions.ts: Added an addListener method to listen for onRangeStart events. addListener(eventName: 'onRangeStart', listenerFunc: (info: { start: number; end: number; spokenWord: string }) => void): Promise<PluginListenerHandle>;

Describe alternatives you've considered: An alternative approach could be to periodically poll the TTS engine for its current progress, but this would be less efficient and more complex to implement. Integrating directly with the UtteranceProgressListener provides a more reliable and accurate solution.

Additional context: This feature is critical for applications that need to provide synchronized text highlighting, karaoke-style text displays, or any other feature that requires real-time tracking of spoken words. Adding this capability to the @capacitor-community/text-to-speech package will significantly enhance its usability for a broader range of applications.

Dhruv-1105 commented 2 months ago

Please check the following PR for this issue: https://github.com/capacitor-community/text-to-speech/pull/132

bridgecode commented 1 day ago

Hello, I'm pretty new to Capacitor and not sure if this is the correct place to put this, but I tried to implement this feature in my Vue/Vite/Ionic/Capacitor app and I'm having trouble getting this to work. Not sure if this will only work on device but I was trying to use this in chrome so I could debug and get it working across all applications with this version. I would prefer to not have to use the Web SpeechSynthesisUtterance API in parallel to prevent different behavior between Web/Mobile. Is it possible to get it working in my local web environment or will this only work on a device, I can use an emulator but I prefer to have it working on the web as well

Thanks in advance for any info or if there's a code pen example I can see with a console log of the start/end/frame, and also thanks for the hard work getting this feature out I'm really excited for this! (please, lmk if I should put this in a separate issue

capacitor-community / text-to-speech

feat: Add support for word-level progress tracking in TextToSpeech #131