alumae / kaldi-gstreamer-server

Real-time full-duplex speech recognition server, based on the Kaldi toolkit and the GStreamer framwork.
BSD 2-Clause "Simplified" License
1.07k stars 339 forks source link

How to tune Kaldi gsteamer for using NNET3 decoder #233

Open purijs opened 4 years ago

purijs commented 4 years ago

The different gstreamer versions available only use nnet2 decoder. Is there a way I can use nnet3 decoder with nnet3 model. I know about the nnet3 mode but it's not accurate.

alumae commented 4 years ago

Our nnet3-based decoder is implemented based on nnet3 decoder in Kaldi. I have seen claims that our decoder is not as accurate as Kaldi's. I believe there could be a bug somewhere in our decoder. If you want to help, you should either try to find the bug yourself or prepare a small test set (acoustic model + graph + an utterance) where you can consistently show that the results between Kaldi and our decoder are different.

purijs commented 4 years ago

Our nnet3-based decoder is implemented based on nnet3 decoder in Kaldi. I have seen claims that our decoder is not as accurate as Kaldi's. I believe there could be a bug somewhere in our decoder. If you want to help, you should either try to find the bug yourself or prepare a small test set (acoustic model + graph + an utterance) where you can consistently show that the results between Kaldi and our decoder are different.

Are you talking about using "nnet3 mode"? Unfortunately, I won't be able to share to data/transcripts but will try to work on some open source audios.

Just wanted to confirm if "nnet3 mode" was the only option to use nnet3 decoder as nnet2 is also True

alumae commented 4 years ago

Yes, "nnet3 mode" is the only mode to use nnet3 models. It has historical reasons. Originally kaldi-gstreamer-server supported only GMM models (because DNN models were not used at time -- it was long ago). Then I implemented the nnet2 GStreamer plugin. Then came nnet3 in Kaldi. Instead of doing a nnet3 GStreamer plugin from scratch, we implemented nnet3 supprt in the nnet2 plugin. But it's doing pretty much what Kaldi's nnet3 decoder is doing, AFAIK. But apparently there might be a small bug somewhere.