alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 341 forks source link

Output timing information #5

Closed alumae closed 9 years ago

alumae commented 10 years ago

The recognition results from the server should include start and end times for each word and/or utterance

rikrd commented 9 years ago

Hi, I have added this in my fork. Currently it only outputs phoneme-level alignment, I also plan to add word-level alignment.

This requires changes to the gst-kaldi-nnet2-online, which I made in my fork.

I could make a merge request, however this has required many large changes everywhere: loading extra files in the gst filter, adding a new signal (final-phone-alignment), are you still interested?

alumae commented 9 years ago

This is implemented: to enable this, use the online DNN based models and configure gst-kaldi-nnet2-online to do word or phone alignment.