Open adx349 opened 7 years ago
@adx349 Try setting the frame-subsamping-factor
to 3, and the acoustic-scale
to 1
Take a look at this thread for more details
i too tried that, and found it is not working. my guess is that it ASPIRE model use BLSTM which is not supported in this online decoding.
@fanskyer @adx349 I actually think it is an issue with the new Kaldi looped decoding not working properly. If you rollback Kaldi to commit bcc71b67d489a1766922c9caf2a54306755f1861 and gst-kaldi-nnet2-online to commit 63b2cfdf9422047b72c5308bd933d82113717da7, then the ASPIRE model works. You will still need to set nnet-mode to 3, acoustic-scale to 1, and frame-subsampling-factor to 3
Were you able to get this working? I tried rolling back to 63b2cfd and setting those options in my config. No luck, it just returns yeah yeah yeah
over and over again.
Here's my config: https://gist.github.com/maxhawkins/24edbd87be0aa1601da5034acc27d7ee
I'm using the ASpIRE chain model from kaldi-asr.org with an HCLG.fst created using the documentation.
Never mind. I was using the client incorrectly. When I converted my wav file to raw PCM it started working fine.
For anyone who encounters this in the future, here are the steps I took:
python kaldigstserver/master_server.py --port=8888 &
env GST_PLUGIN_PATH=.. python kaldigstserver/worker.py -u ws://localhost:8888/worker/ws/speech -c worker.yaml &
sox audio.wav -r 8000 -e signed -b 16 -c 1 -t raw audio.raw remix 1
python kaldigstserver/client.py -r 16000 audio.raw
Just an update to this -- I did some testing on my side, and the ASpIRE model will work with the latest commits and the frame-subsampling-factor set to 1 instead of 3. This is necessary for the most recent "looped decoding" implementation of Kaldi it seems. However, the accuracy appears to be worse than when the commits of both are reversed.
Thanks I'll give that a shot.
I'm also seeing some errors with word-level alignment (subtle drift noticeable on long recordings) with the ASPIRE model at 63b2cfd, but I think that's a separate issue. I'll keep troubleshooting and file another bug if I can't resolve it.
It works for me, but it keeps outputting "mhm" every few seconds, while TEDLIUM didn't. Anyone experienced the same issue?
I've had that issue before. Usually it means your settings are wrong. Check the acoustic-scale
and frame-subsampling-factor
.
Thank you for your work on kaldi, it is very helpful for me. I was wondering what changes do I have to make to use the latest ASpIRE Chain Model. I tried changing the nnet-mode=3 and also replace fst,mdl,conf files with the new model but it is not giving me any output. What do you think is the issue ?