MaryTTS does not wait to end of speech because RemoteTTS does not set "isSpeaking"

Alfiva commented 2 years ago

Describe the bug When using MaryTTS, if a skill uses speak_dialog with wait=True, or any other method that relies on that, like get_response, the speech playback will not wait until the end of the speech before moving to the next instruction. The consequences depend on how the skill is built: The dialog may be suddenly interrupted by the audio cue to start recording, and consequently the execution order of the skill is messed up. Or, because RemoteTTS (the superclass of MaryTTS) splits long sentences, different parts of the speech play on top of each other, resulting in garbled audio.

I have identified the issue being that RemoteTTS, the superclass of MaryTTS, does not call create_signal("isSpeaking") in its definition of execute method, unlike the base TTS superclass that is used in most other TTS modules. Because "isSpeaking" is not set, the wait loop that should hold until the end of the speech ends immediately.

Adding create_signal("isSpeaking") to the beginning of the execute method in RemoteTTS fixes this. Also, I assume other TTS modules that extend RemoteTTS instead of TTS will be affected.

To Reproduce Steps to reproduce the behavior:

Configure TTS to use MaryTTS
Use a skill (or create one) that asks for user response
You should hear the cue to speak before Mycroft ends its question

Expected behavior When speaking dialogs with wait=True or methods like get_response, ask_selection, etc... the speech should complete before the skill moves to the next instruction.

Log files If necessary I could try and extract some logs, although I had to manually add some extra logging to Mycroft /skills scripts to figure out what happened. I could try to provide some audio recordings too.

Environment (please complete the following information):

Device type: Raspberry Pi 4 Model B 4GB
OS: Picroft release candidate v21.02.0_20210604
Mycroft-core version: v21.02.0

Additional context I have Mycroft configured to use "marytts" as TTS module but the actual server is not running MaryTTS but OpenTTS, which can act as a stand-in for MaryTTS clients. The server is running in a laptop while testing and the voice I use is set to high quality. This all results in a noticeable delay when synthesizing speech, which makes the lack of wait more noticeable. It is possible that with very short sentences and fast server response the effect of this bug is easy to miss.

JarbasAl commented 2 years ago

you can use https://github.com/OpenVoiceOS/ovos-tts-plugin-marytts

Alfiva commented 2 years ago

you can use https://github.com/OpenVoiceOS/ovos-tts-plugin-marytts

Haven't tried that but will take a look. However if this plugin also extends teh original RemoteTTS the problem would still be present, since the issue is not with MaryTTS itself but with the RemoteTTS class. By the way, I still have issues if the skill says a very long sentence. I had to modify RemoteTTS to avoid the splitting into smaller sentences and then it works OK.

JarbasAl commented 2 years ago

that plugin using ovos-plugin-manager which is backwards compatible but has lots of improvements as a dependency, that RemoteTTS class is not the same as in mycroft-core and properly uses get_tts instead of overriding the internal execute method which in a sense should be private

forslund commented 1 month ago

Closing Issue since we're archiving the repo

MycroftAI / mycroft-core

MaryTTS does not wait to end of speech because RemoteTTS does not set "isSpeaking" #3116