Closed alumae closed 9 years ago
Hi, I have added this in my fork. Currently it only outputs phoneme-level alignment, I also plan to add word-level alignment.
This requires changes to the gst-kaldi-nnet2-online, which I made in my fork.
I could make a merge request, however this has required many large changes everywhere: loading extra files in the gst filter, adding a new signal (final-phone-alignment), are you still interested?
This is implemented: to enable this, use the online DNN based models and configure gst-kaldi-nnet2-online to do word or phone alignment.
The recognition results from the server should include start and end times for each word and/or utterance