alumae / gst-kaldi-nnet2-online

GStreamer plugin around Kaldi's online neural network decoder
Apache License 2.0
185 stars 100 forks source link

Word-alignment problem #56

Open arielvsp opened 7 years ago

arielvsp commented 7 years ago

Hi,

Running the demo script (./transcribe-audio.sh dr_strangelove.mp3) produces the following output and hangs:

LOG ([5.2.64~1-2fbf2]:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG ([5.2.64~1-2fbf2]:ComputeDerivedVars():ivector-extractor.cc:204) Done.
WARNING ([5.2.64~1-2fbf2]:LatticeWordAligner():word-align-lattice.cc:263) [Lattice has input epsilons and/or is not input-deterministic (in Mohri sense)]-- i.e. lattice is not deterministic.  Word-alignment may be slow and-or blow up in memory.
WARNING ([5.2.64~1-2fbf2]:LatticeWordAligner():word-align-lattice.cc:263) [Lattice has input epsilons and/or is not input-deterministic (in Mohri sense)]-- i.e. lattice is not deterministic.  Word-alignment may be slow and-or blow up in memory.
WARNING ([5.2.64~1-2fbf2]:LatticeWordAligner():word-align-lattice.cc:263) [Lattice has input epsilons and/or is not input-deterministic (in Mohri sense)]-- i.e. lattice is not deterministic.  Word-alignment may be slow and-or blow up in memory.
huh i hello this is hello dimitri listen i i can't hear too well do you support you could turn the music down just a little
Caught SIGSEGV

Is that normal (given the Kaldi warnings)? I have the same behavior with the streaming service (the worker hangs) when do-phone-alignment is set to "true". Is there anything I can do in Kaldi to prevent/improve this?

nshmyrev commented 3 years ago

I investigated this problem, the thing is that you can't call WordAlignLattice twice, the first run replaces silences with epsilons, so the second run emits a warning. A similar problem fix is here:

https://github.com/alphacep/vosk-api/commit/558b4dd69e75e7f5d0644c5221302b6035cbfe99