daanzu / kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
GNU Affero General Public License v3.0
336 stars 50 forks source link

Compile fails with errors related to disambig symbol and nonterm symbols #60

Closed gopik closed 3 years ago

gopik commented 3 years ago

Error 1: FATAL: FstCompiler: Symbol "#14" is not mapped to any integer arc ilabel, symbol table = agf_model/phones.txt, source = , line = 4

This is some hardcoded disambiguation symbol but my model output has max #10 as disambig symbol.

Error 2: File "/Users/gopik/opt/anaconda3/envs/kaldi/lib/python3.9/site-packages/kaldi_active_grammar/compiler.py", line 505, in compile_top_fst return self._build_top_fst(nonterms=['#nonterm:rule'+str(i) for i in range(self._max_rule_id + 1)], noise_words=self._noise_words).compile() File "/Users/gopik/opt/anaconda3/envs/kaldi/lib/python3.9/site-packages/kaldi_active_grammar/compiler.py", line 519, in _build_top_fst fst.add_arc(state_return, state_final, None, '#nonterm:end') File "/Users/gopik/opt/anaconda3/envs/kaldi/lib/python3.9/site-packages/kaldi_active_grammar/wfst.py", line 275, in add_arc olabel_id = self.word_to_olabel_map[olabel] KeyError: '#nonterm:end

I see that nonterm_begin and nonterm_end available (notice underscore) but not with ":". Should I be adding these to nonterminals file?

Since I was unable to convert model using the script (since it's incomplete), I've prepared a lang using all the nonterminals required by this toolkit (like dictation, cloud_dictation, 1000 rule placeholders etc).

Any idea what I might be doing wrong?

gopik commented 3 years ago

Once I prepare_lang.sh with nonterm:end as additional nonterminal, I got past the 2nd issue. But now the dictation.fst generation is crashing as follows (Note: The model was trained without nonterminals, if that's causing this. I'm taking an existing model and trying to convert to use active grammar framework)

VLOG[1] (compile-graph-agf[5.5.0~1-1f5a4]:main():compile-graph-agf.cc:237) Composing CLG fst... ERROR (compile-graph-agf[5.5.0~1-1f5a4]:main():compile-graph-agf.cc:245) Grammar-fst graph creation only supports models with left-biphone context. (--nonterm-phones-offset option was supplied).

[ Stack-Trace: ] 0 libkaldi-base.dylib 0x00000001038275bd kaldi::MessageLogger::LogMessage() const + 813 1 compile-graph-agf 0x0000000102fc9658 kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&) + 24 2 compile-graph-agf 0x0000000102fc851a main + 10698 3 libdyld.dylib 0x00007fff2036cf5d start + 1 4 ??? 0x000000000000000b 0x0 + 11

ERROR (compile-graph-agf[5.5.0~1-1f5a4]:main():compile-graph-agf.cc:310) Exception in compile-graph-agf

[ Stack-Trace: ] 0 libkaldi-base.dylib 0x00000001038275bd kaldi::MessageLogger::LogMessage() const + 813 1 compile-graph-agf 0x0000000102fc9658 kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&) + 24 2 compile-graph-agf 0x0000000102fc95a1 main + 14929 3 libdyld.dylib 0x00007fff2036cf5d start + 1 4 ??? 0x000000000000000b 0x0 + 11

libc++abi: terminating with uncaught exception of type kaldi::KaldiFatalError: kaldi::KaldiFatalError [1] 85686 abort --arcsort-grammar --nonterm-phones-offset=187 --simplify-lg=true (kaldi) ➜ kaldi /Users/gopik/opt/anaconda3/envs/kaldi/lib/python3.9/site-packages/kaldi_active_grammar/exec/macos/compile-graph-agf --arcsort-grammar --nonterm-phones-offset=187 --read-disambig-syms=agf_model/disambig.int --simplify-lg=true --verbose=20 agf_model/tree agf_model/final.mdl agf_model/L_disambig.fst agf_model/G.fst tmp/9a08772eb06c22a0d1f4aefe420b4883.fst

daanzu commented 3 years ago

Ah, yes, the max disambiguation symbol is not currently adjustable by the library user, but there are only a few places in the code where this needs to be changed. @gopik Regarding the compilation error, do you know what the architecture of your model is? Unfortunately, we can only handle left-biphone context models currently. This includes the popular tdnn_f "chain" models.

gopik commented 3 years ago

I've used wsj/s5/local/chain/e2e/run_tdnn_flatstart.sh recipe to train the model. But when the model was trained, non terms were not there in lang. I was trying to reuse that model with AGF.

Also, when I copied the prepared lang with non terms into appropriate model_dir (L_disambig.fst), the disambig #14 issue wasn't there anymore. I was able to call compile successfully (with few warnings), but crashed with above error during init_decoder.

daanzu commented 3 years ago

I am not familiar with the e2e kaldi training, but I think a chain model should work. You definitely don't need to train with the nonterms. Training with the same normal lexicon as I use does make things easier, but should not be necessary.

If the #14 disambiguation symbol issue disappeared, it makes me suspect something got mixed up, but that doesn't explain that particular error appearing.

gopik commented 3 years ago

Yes, it looks like it disappeared since it's not attempting to create those lex disambig fsts now (I created them using prepare_lang).

I realized that the crash message is red herring. The actual issue is decoder initialization failed due to some other missing config options. I realized that e2e training doesn't need ivector training/adaptation but AGF needs ivector configs. So I just copied configs from some other model.

After passing all ivector related confs, I'm now getting failure due to the following error -

KALDI severity=-2] AgfNNet3OnlineModelWrapper requires exactly one of top_fst and top_fst_filename

Full error trace -

kaldi.model (WARNING): model_dir has no version information; errors below may indicate an incompatible model kaldi.compiler (ERROR): cannot find dictation fst: agf_model/Dictation.fst ERROR ([5.5.0~1-1f5a4]:CompileGrammar():./compile-graph-agf.hh:254) Grammar-fst graph creation only supports models with left-biphone context. (--nonterm-phones-offset option was supplied).

[ Stack-Trace: ] 0 libkaldi-base.dylib 0x000000011c7e55bd kaldi::MessageLogger::LogMessage() const + 813 1 libkaldi-dragonfly.dylib 0x00000001110de988 bool dragonfly::BaseNNet3OnlineModelWrapper::Decode<kaldi::SingleUtteranceNnet3DecoderTpl<fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl > > > >(kaldi::SingleUtteranceNnet3DecoderTpl<fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl > > >, float, kaldi::Vector const&, bool, bool) + 1976 2 libkaldi-dragonfly.dylib 0x000000011113aeda dragonfly::AgfCompiler::CompileGrammar(fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl > > const, dragonfly::AgfCompilerConfig const*) + 8298 3 libkaldi-dragonfly.dylib 0x000000011114906f nnet3_agf__compile_graph + 607 4 libffi.dylib 0x00007fff2d92c8f5 ffi_call_unix64 + 85 5 ??? 0x00007f8ba82fe640 0x0 + 140237798893120

WARNING ([5.5.0~1-1f5a4]:nnet3_agfcompile_graph():agf-sub-nnet3.cc:425) Trying to survive fatal exception: kaldi::KaldiFatalError [KALDI severity=-2] AgfNNet3OnlineModelWrapper requires exactly one of top_fst and top_fst_filename [KALDI severity=-1] Trying to survive fatal exception: kaldi::KaldiFatalError Traceback (most recent call last): File "", line 1, in File "/Users/gopik/opt/anaconda3/envs/kaldi/lib/python3.9/site-packages/dragonfly/engines/backend_kaldi/engine.py", line 179, in connect self._decoder = self._compiler.init_decoder(config=self._options['decoder_init_config']) File "/Users/gopik/opt/anaconda3/envs/kaldi/lib/python3.9/site-packages/kaldi_active_grammar/compiler.py", line 295, in init_decoder self.decoder = KaldiAgfNNet3Decoder(**decoder_kwargs) File "/Users/gopik/opt/anaconda3/envs/kaldi/lib/python3.9/site-packages/kaldi_active_grammar/wrapper.py", line 438, in init if not self._model: raise KaldiError("failed nnet3_agf__construct") kaldi_active_grammar.KaldiError: failed nnet3_agfconstruct

gopik commented 3 years ago

Looks like it's failing to compile G.fst to dictation fst, hence top_fst/top_fst_filename not available.

daanzu commented 3 years ago

Ah, yes, I forgot. You need to run python -m kaldi_active_grammar compile_agf_dictation_graph -v -m {{model_dir}} {{model_dir}}/G.fst to generate the Dictation.fst. The G.fst can be from one of my models, or use any suitable language model.

gopik commented 3 years ago

Thanks @daanzu. The e2e recipe doesn't use biphones by default. That's the reason grammar fst related code was complaining. Ref: https://github.com/kaldi-asr/kaldi/blob/cafb8b315ae588cad0210655be539c6742e2e829/egs/wsj/s5/steps/nnet3/chain/e2e/prepare_e2e.sh#L19

Sorry didn't realize this earlier.