alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 341 forks source link

Support client side VAD #23

Open ngoel17 opened 9 years ago

ngoel17 commented 9 years ago

To support client side VAD, the server should send FINAL hyp after a specified timeout, even if it's not been recieving any audio (so not much silence is detected)

alumae commented 9 years ago

I don't really like this idea. I would like the server to be as independent of actual time as possible.

Currently, I would recommend to just send an EOS when a client side silence is detected, and start a new session when speech restarts.

Another possibility would be to come up with a new message (similar to "EOS") to mark client side end of a speech segment, that client must send to the server, and the server somehow forwards to the Kaldi plugins. This would need some supporting code from the plugins.

rikrd commented 9 years ago

I also would like the alternative solution Tanel proposes (End Of Segment signal). This would be useful to switch models or grammars between speech segments, without having to reload the workers.

On Wed, Jul 29, 2015 at 10:44 AM, Tanel Alumäe notifications@github.com wrote:

I don't really like this idea. I would like the server to be as independent of actual time as possible.

Currently, I would recommend to just send an EOS when a client side silence is detected, and start a new session when speech restarts.

Another possibility would be to come up with a new message (similar to "EOS") to mark client side end of a speech segment, that client must send to the server, and the server somehow forwards to the Kaldi plugins. This would need some supporting code from the plugins.

— Reply to this email directly or view it on GitHub https://github.com/alumae/kaldi-gstreamer-server/issues/23#issuecomment-125882647 .

ricard http://twitter.com/ricardmp http://www.ricardmarxer.com http://www.caligraft.com

ngoel17 commented 9 years ago

I like the idea too. Maybe call it PAU which is not much effort to send, and then there will be no need to close connection. There could be other signals for other stuff. On Jul 29, 2015 5:17 AM, "Ricard Marxer" notifications@github.com wrote:

I also would like the alternative solution Tanel proposes (End Of Segment signal). This would be useful to switch models or grammars between speech segments, without having to reload the workers.

On Wed, Jul 29, 2015 at 10:44 AM, Tanel Alumäe notifications@github.com wrote:

I don't really like this idea. I would like the server to be as independent of actual time as possible.

Currently, I would recommend to just send an EOS when a client side silence is detected, and start a new session when speech restarts.

Another possibility would be to come up with a new message (similar to "EOS") to mark client side end of a speech segment, that client must send to the server, and the server somehow forwards to the Kaldi plugins. This would need some supporting code from the plugins.

— Reply to this email directly or view it on GitHub < https://github.com/alumae/kaldi-gstreamer-server/issues/23#issuecomment-125882647

.

ricard http://twitter.com/ricardmp http://www.ricardmarxer.com http://www.caligraft.com

— Reply to this email directly or view it on GitHub https://github.com/alumae/kaldi-gstreamer-server/issues/23#issuecomment-125889376 .