alumae / gst-kaldi-nnet2-online

GStreamer plugin around Kaldi's online neural network decoder
Apache License 2.0
185 stars 100 forks source link

Chain models #47

Closed nizmagu closed 7 years ago

nizmagu commented 7 years ago

When trying to use the plugin on kaldi-gstreamer-server with chain (nnet3) models, the server gives a 7-word sentence as a result, similar to the problem exhibited in issue #45.

However, upon setting acoustic-scale to 1.0 and frame-subsampling-factor to 3, the server still doesn't seem to be transcribing with a WER even remotely as good as it did with the same models offline used decode.sh.

Here is the YAML I used for the worker:

use-nnet2: True
decoder:
    nnet-mode: 3
    use-threaded-decoder: false
    model: model_dir/final.mdl
    word-syms: model_dir/words.txt
    phone-syms: model_dir/phones.txt
    word-boundary-file: model_dir/word_boundary.int
    fst: model_dir/HCLG.fst
    lmwt-scale: 1.0
    lm-fst: model_dir/G.fst
    big-lm-const-arpa: model_dir/G.carpa
    mfcc-config : model_dir/mfcc.conf
    max-active: 7000
    min-active: 200
    max-mem: 50000000
    do-endpointing: true
    beam: 15.0
    lattice-beam: 8.0
    acoustic-scale: 1.0
    endpoint-silence-phones : "1:2:3:4:5"
    traceback-period-in-secs: 1
    chunk-length-in-secs: 0.25
    num-nbest: 10
    frame-subsampling-factor: 3

I tried playing around with beam, lattice-beam, max-active etc. both according to and not according to the properties used by the offline decode.sh (which delivered desirable results). What could be the problem?

alumae commented 7 years ago

Hmm, what is the difference in WER (roughly)? The configuration looks fine.

alumae commented 7 years ago

Our nnet3 decoding implementation is based onsrc/online2bin/online2-wav-nnet3-latgen-faster.cc. Can you test with a few wavs using the server and src/online2bin/online2-wav-nnet3-latgen-faster, and show the differences?

nizmagu commented 7 years ago

The WER difference between the server and online2-wav-nnet3-latgen-faster is really huge - online2-wav-nnet3-latgen-faster decodes some words, albeit worse than the offline decoding, at a WER of around 30-35%, and the server decodes "no no no" for pretty much any sentence (WER is essentially 100%).

I saw that the sentences in online2-wav-nnet3-latgen-faster were decoded this badly, and it was fixed when I set "frame-subsampling-factor=3" (like I did in the server), and "online=false".

Is there any way to set "online=false" in the server or something similar?

alumae commented 7 years ago

As far as I know, "online=false" in online2-wav-nnet3-latgen-faster means that i-vectors are not calculated on the fly but in an offline manner (meaning over all the data for the speaker). But I doubt it would make such a big difference. Do you know if the i-vectors of your chain models are trained in online mode? If you used the default chain model recipes (say, SWBD or AMI), then they should be.

nizmagu commented 7 years ago

I think they are, but I don't know for sure.

alumae commented 7 years ago

Are you by any chance trying to use BLSTM models? They are not compatible with the server.

yifan commented 7 years ago

@nizmagu Did you try to look at the audio saved on the server side? I had in the past seen corrupted audio in some cases, especially when I am running more than one decoder simultaneously.

nizmagu commented 7 years ago

I just retrained our models and everything works as expected.

We previously didn't train ivector extractors, so now I used SWBD scripts as you suggested and the problem was solved.

Thank you very much for your help and work!

avelom commented 6 years ago

Hi,

I would like to point the question again. Is there any way to set "online=false" in the server or something similar? The WER difference is around 5% better in offline mode compared to the server.

According to the online2-wav-nnet3-latgen-faster.cc line 118: po.Register("online", &online, "You can set this to false to disable online iVector estimation " "and have all the data for each utterance used, even at " "utterance start. This is useful where you just want the best " "results and don't care about online operation. Setting this to " "false has the same effect as setting " "--use-most-recent-ivector=true and --greedy-ivector-extractor=true " "in the file given to --ivector-extraction-config, and " "--chunk-length=-1.");

I set "--use-most-recent-ivector=true and --greedy-ivector-extractor=true " in the file given to --ivector-extraction-config and I set --chunk-length-in=-1 in the .yaml file. However, the result is still like in online mode.

alumae commented 6 years ago

No, there is no way to set "online=false".