daanzu / kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
GNU Affero General Public License v3.0
332 stars 49 forks source link

Is there a future for Text to Speech integration in Kaldi engine ? A way around ? #57

Open Wil31 opened 3 years ago

Wil31 commented 3 years ago

Hello, I'm interrrested in having some Text to Speech feedback while using the voice commands. I assume it isn't planned for integration in this engine yet, so is there a way around we could use ? (I would like to keep using kaldi-active-grammar as engine and no internet connection)

def speak(self, text):
        """ Speak the given *text* using text-to-speech. """
        # FIXME
        self._log.warning("Text-to-speech is not implemented for this engine; printing text instead.")
        print_(text)
daanzu commented 3 years ago

That would be a good feature to add, but I haven't gotten around to it yet. There are many text to speech programs and libraries available nowadays, but the problem can be packaging them to make installation easy. The quality of the speech varies. If you want to get going quickly, I would suggest trying to use espeak, either through a python library or via the CLI binary and subprocess. There has been discussion with @Danesprite about adding something like this to dragonfly.

drmfinlay commented 2 years ago

Hello @Wil31 and @daanzu,

Dragonfly's text-to-speech integration will be much improved in the next version. On Windows, Dragonfly will default to using the system's builtin text-to-speech. On other platforms, it will look for usable eSpeak then CMU Flite binaries. They will need to be installed from the system package manager (or compiled from source).

The above engine.speak() method will use the appropriate text-to-speech back-end. I'll post a link here to the relevant documentation page when the next release is out.

drmfinlay commented 2 years ago

The text-to-speech functionality, now somewhat detached from Dragonfly's speech recognition engines, is documented on the following two pages:

https://dragonfly2.readthedocs.io/en/latest/engines.html https://dragonfly2.readthedocs.io/en/latest/speakers.html

It works as stated in my previous post.