Elleo / gst-deepspeech

NOTE: This plugin is now deprecated in favour of the coqui-stt branch in gst-plugins-bad: https://gitlab.freedesktop.org/philn/gstreamer/-/tree/coqui-stt/subprojects/gst-plugins-bad/ext/coqui
Other
169 stars 20 forks source link

Compatibility with Kaldi GStreamer interface #2

Open LuccoJ opened 6 years ago

LuccoJ commented 6 years ago

This is a bit of a "did you know about...?" thing.

The open-source speech recognition engine Kaldi has a GStreamer interface which (unlike yours, as I understand it) can work over the internet in realtime by using websockets to continuously output partial recognition hypotheses, instead of just writing what it recognizes to stdout.

I'm thinking that having gst-deepspeech be compatible with its protocol would be useful not only as a way to bootstrap an informal standard on how GStreamer speech recognition interfaces can talk, but also because on Android, Kõnele can effectively replace Google's proprietary recognition (client-side and server-side) by communicating to a Kaldi-style GStreamer module over the network: I have a video here to demonstrate this, which was made easy for me to set up thanks to Eduardo Silva's Dockerfile of Kaldi-GStreamer.

And from the other end, if your ibus-deepspeech supported the same protocol, it could interoperate with Kaldi as well as DeepSpeech as a backend, which would allow it to run also on weaker hardware, as my understanding is that DeepSpeech has better recognition than Kaldi but at the cost of being slower.

This way, someone could have a home speech recognition server (running either DeepSpeech or Kaldi depending on hardware abilities) and both IBus and Android clients would be able to transparently take advantage of it.