alumae / gst-kaldi-nnet2-online

GStreamer plugin around Kaldi's online neural network decoder
Apache License 2.0
185 stars 100 forks source link

nnet3 problem #61

Closed guliashvili closed 6 years ago

guliashvili commented 6 years ago

Hi, I'm trying to use this project for kaldi nnet3 model. Here is most of the part(excluding model) uploaded. https://github.com/guliashvili/gst-kaldi

However I get following error.

ubuntu@ip-172-31-46-255:/home/a/gst/demo$ ./transcribe-audio.sh  dr_strangelove.mp3 
LOG ([5.2.128~1400-2553]:CompileLooped():nnet-compile-looped.cc:336) Spent 1.29222 seconds in looped compilation.
LOG ([5.2.128~1400-2553]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.2.128~1400-2553]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
ERROR ([5.2.128~1400-2553]:DecodableNnetLoopedOnlineBase():decodable-online-looped.cc:45) Input feature dimension mismatch: got 40 but network expects 43

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::nnet3::DecodableNnetLoopedOnlineBase::DecodableNnetLoopedOnlineBase(kaldi::nnet3::DecodableNnetSimpleLoopedInfo const&, kaldi::OnlineFeatureInterface*, kaldi::OnlineFeatureInterface*)
kaldi::nnet3::DecodableAmNnetLoopedOnline::DecodableAmNnetLoopedOnline(kaldi::TransitionModel const&, kaldi::nnet3::DecodableNnetSimpleLoopedInfo const&, kaldi::OnlineFeatureInterface*, kaldi::OnlineFeatureInterface*)
kaldi::SingleUtteranceNnet3Decoder::SingleUtteranceNnet3Decoder(kaldi::LatticeFasterDecoderConfig const&, kaldi::TransitionModel const&, kaldi::nnet3::DecodableNnetSimpleLoopedInfo const&, fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > > const&, kaldi::OnlineNnet2FeaturePipeline*)

clone

terminate called after throwing an instance of 'std::runtime_error'
  what():  

./transcribe-audio.sh: line 38: 10047 Aborted                 (core dumped) GST_PLUGIN_PATH=../src gst-launch-1.0 --gst-debug="" -q filesrc location=$audio ! decodebin ! audioconvert ! audioresample ! kaldinnet2onlinedecoder use-threaded-decoder=true model=final.mdl fst=HCLG.fst word-syms=words.txt phone-syms=phones.txt word-boundary-file=word_boundary.int num-nbest=3 num-phone-alignment=3 do-phone-alignment=true feature-type=mfcc mfcc-config=conf/mfcc.conf ivector-extraction-config=conf/ivector_extractor.fixed.conf max-active=7000 beam=11.0 lattice-beam=5.0 do-endpointing=true endpoint-silence-phones="1:2:3:4:5:6:7:8:9:10" chunk-length-in-secs=0.2 ! filesink location=/dev/stdout buffer-mode=2

Can anyone help me?

This is whole project tree

.
├── COPYING
├── demo
│   ├── conf
│   │   ├── ivector_extractor.conf
│   │   ├── ivector_extractor.fixed.conf
│   │   ├── mfcc.conf
│   │   ├── online_cmvn.conf
│   │   ├── online.conf
│   │   ├── online_pitch.conf
│   │   └── splice.conf
│   ├── dr_strangelove.mp3
│   ├── final.mdl
│   ├── gui-demo.py
│   ├── HCLG.fst
│   ├── ivector_extractor
│   │   ├── final.dubm
│   │   ├── final.ie
│   │   ├── final.mat
│   │   ├── global_cmvn.stats
│   │   ├── online_cmvn.conf
│   │   └── splice_opts
│   ├── phones.txt
│   ├── prepare-models.sh
│   ├── README.md
│   ├── transcribe-audio-gio.sh
│   ├── transcribe-audio.sh
│   ├── word_boundary.int
│   └── words.txt
├── README.md
└── src
    ├── gst-audio-source.cc
    ├── gst-audio-source.h
    ├── gst-audio-source.o
    ├── gstkaldinnet2onlinedecoder.cc
    ├── gstkaldinnet2onlinedecoder.h
    ├── gstkaldinnet2onlinedecoder.o
    ├── kaldimarshal.cc
    ├── kaldimarshal.h
    ├── kaldimarshal.list
    ├── kaldimarshal.o
    ├── libgstkaldionline2.so
    ├── Makefile
    ├── simple-options-gst.cc
    ├── simple-options-gst.h
    └── simple-options-gst.o
alumae commented 6 years ago

I think your model uses pitch features. Try adding add-pitch=true online-pitch-config=demo/conf/online_pitch.conf to kaldinnet2onlinedecoder parameters in transcribe-audio.s.

guliashvili commented 6 years ago

Thanks @alumae . It solved my problem

xiaoch2004 commented 5 years ago

@alumae Hi, I met the same problem in kaldi-gstreamer-server. I trained my model with mfcc+pitch features. However when decoding it has this error: ERROR ([5.2.128~1400-2553]:DecodableNnetLoopedOnlineBase():decodable-online-looped.cc:45) Input feature dimension mismatch: got 40 but network expects 43

This is my online.conf:

--feature-type=mfcc
--mfcc-config=/home/x/xiaochan/kaldi2/egs/aidatatang_200zh/s5/chain_online_17nov/conf/mfcc.conf
--ivector-extraction-config=/home/xiaochan/kaldi/egs/aidatatang_200zh/s5/chain_online_17nov/conf/ivector_extractor.conf
--add-pitch=true
--online-pitch-config=/home/xiaochan/kaldi/egs/aidatatang_200zh/s5/chain_online_17nov/conf/online_pitch.conf
--endpoint.silence-phones=1:2:3:4

Can anybody help me?

xiaoch2004 commented 5 years ago

@alumae Hi, I met the same problem in kaldi-gstreamer-server. I trained my model with mfcc+pitch features. However when decoding it has this error: ERROR ([5.2.128~1400-2553]:DecodableNnetLoopedOnlineBase():decodable-online-looped.cc:45) Input feature dimension mismatch: got 40 but network expects 43

This is my online.conf:

--feature-type=mfcc
--mfcc-config=/home/x/xiaochan/kaldi2/egs/aidatatang_200zh/s5/chain_online_17nov/conf/mfcc.conf
--ivector-extraction-config=/home/xiaochan/kaldi/egs/aidatatang_200zh/s5/chain_online_17nov/conf/ivector_extractor.conf
--add-pitch=true
--online-pitch-config=/home/xiaochan/kaldi/egs/aidatatang_200zh/s5/chain_online_17nov/conf/online_pitch.conf
--endpoint.silence-phones=1:2:3:4

Can anybody help me?

I solve this by adding

add-pitch:true
online-pitch-config: path/to/online_pitch.conf

to my yaml file