alumae / gst-kaldi-nnet2-online

GStreamer plugin around Kaldi's online neural network decoder
Apache License 2.0
185 stars 100 forks source link

Problem with NNet3 model #62

Closed FredPraca closed 6 years ago

FredPraca commented 6 years ago

Hello guys, I'm trying to use the plugin with the following chain:

gst-launch-1.0 pulsesrc device=alsa_input.pci-0000_00_05.0.analog-stereo ! queue ! audioconvert ! audioresample ! tee name=t ! queue ! kaldinnet2onlinedecoder use-threaded-decoder=0 nnet-mode=3 model=/opt/models/fr/final.mdl word-syms=/opt/models/fr/words.txt fst=/opt/models/fr/HCLG.fst mfcc-config=/opt/models/fr/mfcc_hires.conf ivector-extraction-config=/opt/models/fr/ivector-extraction/ivector_extractor.conf phone-syms=/opt/models/fr/phones.txt frame-subsampling-factor=3 max-active=7000 beam=13.0 lattice-beam=8.0 acoustic-scale=1 do-endpointing=1 endpoint-silence-phones=\"1:2:3:4:5:16:17:18:19:20\" traceback-period-in-secs=0.25 num-nbest=2 chunk-length-in-secs=0.25 ! filesink async=0 location=/dev/stdout t. ! queue ! autoaudiosink async=0

The problem is that I get the following assert:

ASSERTION_FAILED ([5.2]:AdvanceChunk():decodable-online-looped.cc:223) : 'current_log_post_.NumRows() == info_.frames_per_chunk / info_.opts.frame_subsampling_factor && current_log_post_.NumCols() == info_.output_dim'

[ Stack-Trace: ]

kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*)
kaldi::MessageLogger::~MessageLogger()
kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)
kaldi::nnet3::DecodableNnetLoopedOnlineBase::AdvanceChunk()
kaldi::nnet3::DecodableNnetLoopedOnlineBase::EnsureFrameIsComputed(int)
kaldi::nnet3::DecodableAmNnetLoopedOnline::LogLikelihood(int, int)
kaldi::LatticeFasterOnlineDecoder::ProcessEmitting(kaldi::DecodableInterface*)
kaldi::LatticeFasterOnlineDecoder::AdvanceDecoding(kaldi::DecodableInterface*, int)
kaldi::SingleUtteranceNnet3Decoder::AdvanceDecoding()

The curious thing is that it works when used with the Kaldi Gstreamer Server. I can avoid this assert by removing the frame-subsampling-factor but in this case, it becomes really long and I get a warning about the lattice beam.

Any idea ?

FredPraca commented 6 years ago

For the sake of completeness, I also should say that it fails the same way with the gui-demo code.

FredPraca commented 6 years ago

Good news, I made it work but the fix is not that good. In fact, the problem is in the order of the parameters used. When putting fst and model in last positions, it works.

gst-launch-1.0 pulsesrc device=alsa_input.pci-0000_00_05.0.analog-stereo ! queue ! \
               audioconvert ! \
               audioresample ! tee name=t ! queue ! \
           kaldinnet2onlinedecoder \
           use-threaded-decoder=0 \
           nnet-mode=3 \
           word-syms=/opt/models/fr/words.txt \
           mfcc-config=/opt/models/fr/mfcc_hires.conf \
           ivector-extraction-config=/opt/models/fr/ivector-extraction/ivector_extractor.conf \
           phone-syms=/opt/models/fr/phones.txt \
           frame-subsampling-factor=3 \
           max-active=7000 \
           beam=13.0 \
           lattice-beam=8.0 \
           acoustic-scale=1 \
           do-endpointing=1 \
           endpoint-silence-phones=1:2:3:4:5:16:17:18:19:20 \
           traceback-period-in-secs=0.25 \
           num-nbest=2 \
           chunk-length-in-secs=0.25 \
           fst=/opt/models/fr/HCLG.fst \
           model=/opt/models/fr/final.mdl \
           ! filesink async=0 location=/dev/stdout t. ! queue ! autoaudiosink async=0

I think it's anyway still a problem.

mgoldey commented 6 years ago

Thank you for posting your workaround.

FredPraca commented 6 years ago

The problem is that the fix only closes the server version but we use it as GStreamer plugin directly. So, I would say that this issue on the plugin is not closed.