Closed nizmagu closed 7 years ago
Hmm, what is the difference in WER (roughly)? The configuration looks fine.
Our nnet3 decoding implementation is based onsrc/online2bin/online2-wav-nnet3-latgen-faster.cc
. Can you test with a few wavs using the server and src/online2bin/online2-wav-nnet3-latgen-faster, and show the differences?
The WER difference between the server and online2-wav-nnet3-latgen-faster
is really huge - online2-wav-nnet3-latgen-faster
decodes some words, albeit worse than the offline decoding, at a WER of around 30-35%, and the server decodes "no no no" for pretty much any sentence (WER is essentially 100%).
I saw that the sentences in online2-wav-nnet3-latgen-faster
were decoded this badly, and it was fixed when I set "frame-subsampling-factor=3" (like I did in the server), and "online=false".
Is there any way to set "online=false" in the server or something similar?
As far as I know, "online=false" in online2-wav-nnet3-latgen-faster
means that i-vectors are not calculated on the fly but in an offline manner (meaning over all the data for the speaker). But I doubt it would make such a big difference.
Do you know if the i-vectors of your chain models are trained in online mode? If you used the default chain model recipes (say, SWBD or AMI), then they should be.
I think they are, but I don't know for sure.
Are you by any chance trying to use BLSTM models? They are not compatible with the server.
@nizmagu Did you try to look at the audio saved on the server side? I had in the past seen corrupted audio in some cases, especially when I am running more than one decoder simultaneously.
I just retrained our models and everything works as expected.
We previously didn't train ivector extractors, so now I used SWBD scripts as you suggested and the problem was solved.
Thank you very much for your help and work!
Hi,
I would like to point the question again. Is there any way to set "online=false" in the server or something similar? The WER difference is around 5% better in offline mode compared to the server.
According to the online2-wav-nnet3-latgen-faster.cc line 118: po.Register("online", &online, "You can set this to false to disable online iVector estimation " "and have all the data for each utterance used, even at " "utterance start. This is useful where you just want the best " "results and don't care about online operation. Setting this to " "false has the same effect as setting " "--use-most-recent-ivector=true and --greedy-ivector-extractor=true " "in the file given to --ivector-extraction-config, and " "--chunk-length=-1.");
I set "--use-most-recent-ivector=true and --greedy-ivector-extractor=true " in the file given to --ivector-extraction-config and I set --chunk-length-in=-1 in the .yaml file. However, the result is still like in online mode.
No, there is no way to set "online=false".
When trying to use the plugin on kaldi-gstreamer-server with chain (nnet3) models, the server gives a 7-word sentence as a result, similar to the problem exhibited in issue #45.
However, upon setting acoustic-scale to 1.0 and frame-subsampling-factor to 3, the server still doesn't seem to be transcribing with a WER even remotely as good as it did with the same models offline used decode.sh.
Here is the YAML I used for the worker:
I tried playing around with beam, lattice-beam, max-active etc. both according to and not according to the properties used by the offline decode.sh (which delivered desirable results). What could be the problem?