alumae / gst-kaldi-nnet2-online

GStreamer plugin around Kaldi's online neural network decoder
Apache License 2.0
185 stars 100 forks source link

Use ASpIRE Chain Model (By Dan Povey) #50

Open adx349 opened 7 years ago

adx349 commented 7 years ago

Thank you for your work on kaldi, it is very helpful for me. I was wondering what changes do I have to make to use the latest ASpIRE Chain Model. I tried changing the nnet-mode=3 and also replace fst,mdl,conf files with the new model but it is not giving me any output. What do you think is the issue ?

arawind commented 7 years ago

@adx349 Try setting the frame-subsamping-factor to 3, and the acoustic-scale to 1

Take a look at this thread for more details

fanskyer commented 7 years ago

i too tried that, and found it is not working. my guess is that it ASPIRE model use BLSTM which is not supported in this online decoding.

tshastry commented 7 years ago

@fanskyer @adx349 I actually think it is an issue with the new Kaldi looped decoding not working properly. If you rollback Kaldi to commit bcc71b67d489a1766922c9caf2a54306755f1861 and gst-kaldi-nnet2-online to commit 63b2cfdf9422047b72c5308bd933d82113717da7, then the ASPIRE model works. You will still need to set nnet-mode to 3, acoustic-scale to 1, and frame-subsampling-factor to 3

maxhawkins commented 7 years ago

Were you able to get this working? I tried rolling back to 63b2cfd and setting those options in my config. No luck, it just returns yeah yeah yeah over and over again.

Here's my config: https://gist.github.com/maxhawkins/24edbd87be0aa1601da5034acc27d7ee

I'm using the ASpIRE chain model from kaldi-asr.org with an HCLG.fst created using the documentation.

maxhawkins commented 7 years ago

Never mind. I was using the client incorrectly. When I converted my wav file to raw PCM it started working fine.

For anyone who encounters this in the future, here are the steps I took:

  1. Compile kaldi-asr/kaldi@bcc71b6 and alumae/gst-kaldi-nnet2-online@63b2cfd
  2. Compile the ASpIRE HCLG.fst and point the worker.yml to it.
  3. Start the server and pass it raw audio using client.py
python kaldigstserver/master_server.py --port=8888 &
env GST_PLUGIN_PATH=.. python kaldigstserver/worker.py -u ws://localhost:8888/worker/ws/speech -c worker.yaml &
sox audio.wav -r 8000 -e signed -b 16 -c 1 -t raw audio.raw remix 1
python kaldigstserver/client.py -r 16000 audio.raw
tshastry commented 7 years ago

Just an update to this -- I did some testing on my side, and the ASpIRE model will work with the latest commits and the frame-subsampling-factor set to 1 instead of 3. This is necessary for the most recent "looped decoding" implementation of Kaldi it seems. However, the accuracy appears to be worse than when the commits of both are reversed.

maxhawkins commented 7 years ago

Thanks I'll give that a shot.

I'm also seeing some errors with word-level alignment (subtle drift noticeable on long recordings) with the ASPIRE model at 63b2cfd, but I think that's a separate issue. I'll keep troubleshooting and file another bug if I can't resolve it.

suhel-jaber commented 6 years ago

It works for me, but it keeps outputting "mhm" every few seconds, while TEDLIUM didn't. Anyone experienced the same issue?

maxhawkins commented 6 years ago

I've had that issue before. Usually it means your settings are wrong. Check the acoustic-scale and frame-subsampling-factor.