Start/end times for utterances

alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.

BSD 2-Clause "Simplified" License

1.07k stars 341 forks source link

Start/end times for utterances #49

Open dimidd opened 8 years ago

dimidd commented 8 years ago

Thanks for this great project. When sending an audio file to ".../client/dynamic/recognize", the response json contains an id field. Is it possible to get the start and end times of the utterance using the id or any other way?

I've noticed that the server logs contain segment-start and segment-length, can they be sent to the client somehow?

e.g. INFO 2016-08-14 13:53:28,938 30a81637-769f-4fe0-9be2-2bb7cfb25062: Receiving event {u'status': 0, u'segment-start': 42.68000030517578, u'segment-length': 15.819999694824219, u'tota... from worker

alumae commented 8 years ago

The HTTP POST/PUT interface doesn't provide utterance start/end times. But it should be easy to modify it so that it does give the time information. Bear in mind that the recognition result that this interface returns can correspond to multiple input segments -- the corresponding hypotheses are simply concatenated. You have to think whether you want the start and end times of individual segments or not.

dimidd commented 8 years ago

Thanks, I think it'd be best to add an optional parameter, e.g. recognize?times=true, to get start times and lengths of all segments. Could you provide some guidance how to add this feature? I'd love to make a PR.

mike-a-ellis commented 7 years ago

Hello alumae,

Any intent to implement utterance times?