daanzu / kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
GNU Affero General Public License v3.0
332 stars 49 forks source link

Missing documentation: Import of a custom kaldi model #39

Open JohnDoe02 opened 3 years ago

JohnDoe02 commented 3 years ago

What steps are necessary to import a custom kaldi model (trained from scratch, not transfer-learned as in #33) into KAG?

In the readme it is currently stated that:

Or use your own model. Standard Kaldi models must be converted to be usable. Conversion can be performed automatically, but this hasn't been fully implemented yet.

What steps are necessary to kick off the mentioned partial implementation for automatic conversion? What steps remain to be carried out by the user?

daanzu commented 3 years ago

How much work it takes depends on the model's configuration (perhaps unsurprising since Kaldi is so configurable). If you are performing the training with the intent to use it with KaldiAG, it can be made quite easy. It's been awhile since I converted the Zamia model, so I may be forgetting something, but as I recall...

FWIW, the unfinished and untested converter is in model.py: see convert_generic_model_to_agf().

JohnDoe02 commented 3 years ago

Got it to work! Turns out I only had forgotten to rename splice_opts to splice.conf. Furthermore one also has to add a linebreak in said file as otherwise the parsing will make KAG crash. But that was it.

Just for the record: I used the phone set of your daanzu_20200905 model for my training, but added a number of words to the lexicon. However, I believe this alone has no impact on integration with KAG as long as I don't need those extra words for dictation. They simply live in my user_lexicon.txt

JohnDoe02 commented 3 years ago

So I was too fast. While everything that uses non-dictation commands works like a charm, dictation is broken. I only get garbage, nothing that's in any way related to what I said. Looks indeed as some ids don't fit.

However, I do not really have an idea what's the root cause. This is what I am using for creating the model dir:

cp -r kaldi_model final_model
cp conf/mfcc.conf final_model/conf
cp conf/mfcc_hires.conf final_model/conf
cp conf/online_cmvn.conf final_model/conf
cp exp/nnet3_cleaned/extractor/splice_opts final_model/conf/splice.conf
cp exp/nnet3_cleaned/ivectors_jd_ls_100_clean_sp_hires/conf/ivector_extractor.conf final_model/conf

cp exp/nnet3_cleaned/extractor/final.* final_model/ivector_extractor
cp exp/nnet3_cleaned/extractor/global_cmvn.stats final_model/ivector_extractor

cp exp/chain_cleaned/tdnn_1d_sp/final.mdl final_model/
cp exp/chain_cleaned/tdnn_1d_sp/tree final_model/
daanzu commented 3 years ago

@JohnDoe02 Ah, I forgot about the dictation FST! You will need to re-compile it using your new .mdl file. Try:

python3 -m kaldi_active_grammar compile_agf_dictation_graph -m kaldi_model_dir/G.fst -v
JohnDoe02 commented 3 years ago

Just for reference:

python3 -m kaldi_active_grammar compile_agf_dictation_graph -m final_model/ -v

did the trick.

widdiot commented 3 years ago

Could you please provide the general steps to adapt kaldi models (trained for language other than english) ?

SwimmingTiger commented 3 years ago

I want to convert any of the following Chinese Mandarin models to compatible with KAG. Thanks for any help or documentation.

I have no experience with Kaldi. Currently the only one environment I can run is from kaldi-dragonfly-winpython37.zip. After getting the available models, I will develop my application with dragonfly.

And I know something about CMUSphinx. I tried Sphinx4 and found that it lacked some features I needed. So I switched to dragonfly/KAG. The English model in kaldi-dragonfly-winpython37.zip perfectly meets my needs, but my program needs to support more languages, especially Chinese.

daanzu commented 3 years ago

Similar discussion in #21.

lormaechea commented 2 years ago

If it can still be of help/interest to anyone, I have been recently working on importing my own French custom models into KAG. After testing them, I have found them to be well-performing and functional, although I would still need to check some configurations to improve the WER%.

To do this, I first performed an acoustic training (HMM-DNN nnet3 chain models) with Kaldi based on 1000h of French speech. Once it was done, I created a folder to dump my KAG custom model in:

KAG_DIR="kag_model"
mkdir -p ${KAG_DIR}

And I subsequently copied the files coming from my training (as pointed out by @JohnDoe02). In my case:

cp conf/mfcc.conf ${KAG_DIR}/conf
cp conf/mfcc_hires.conf ${KAG_DIR}/conf
cp conf/online_cmvn.conf $AG_DIR}/conf

cp exp/nnet3/extractor/splice_opts ${KAG_DIR}/conf/splice.conf
cp exp/nnet3/ivectors_train_nodup_sp/conf/ivector_extractor.conf ${KAG_DIR}/conf

cp -r exp/nnet3/extractor/final.* ${KAG_DIR}/ivector_extractor/
cp exp/nnet3/extractor/global_cmvn.stats ${KAG_DIR}/ivector_extractor/

cp exp/chain/tdnn_ceos_sp_online/final.mdl ${KAG_DIR}/
cp exp/chain/tdnn_ceos_sp_online/tree ${KAG_DIR}/

Once this was done, I proceeded to compile my language model. To make it work with KAG, I had to deal with the KAG hard coded constants for words and phones. To resolve this, it is necessary to add the list of nonterminals.txt used in KAG (it can be found on any of the available models) to the folder where my pronunciation models are located:

cp nonterminals.txt ${LEXICON_DIR}/dict

I later run the data preparation with Kaldi:

./utils/prepare_lang.sh <dict-src-dir> <oov-dict-entry> <tmp-dir> <lang-dir>

Once this process is finished, we can copy the following files to the folder that will contain our KAG model:

cp ${LANG_DIR}/G.fst ${KAG_DIR}

cp ${LANG_DIR}/words.txt ${KAG_DIR}/words.txt
cp ${LANG_DIR}/words.txt ${KAG_DIR}/words.base.txt # Same as previous file
cp $${LANG_DIR}/words.txt ${KAG_DIR}/words.nonterm.txt # Just including nonterminals

cp ${LANG_DIR}/phones/align_lexicon.int ${KAG_DIR}
cp ${LANG_DIR}/phones/align_lexicon.int ${KAG_DIR}/align_lexicon.base.int # Same as previous file
cp ${LANG_DIR}/phones/align_lexicon.int ${KAG_DIR}/align_lexicon.nonterm.int # Just including nonterminals

cp ${LANG_DIR}/phones/disambig.int ${KAG_DIR}
cp ${LANG_DIR}/phones/left_context_phones.txt ${KAG_DIR}
cp -r ${LANG_DIR}/phones/wdisambig_* ${KAG_DIR}

cp ${LANG_DIR}/phones.txt ${KAG_DIR}
cp ${LANG_DIR}/phones.txt ${KAG_DIR}/phones.nonterm.txt # Just including nonterminals

cp ${LEXICON_DIR}/L_disambig.fst
cp ${LEXICON_DIR}/dict/lexicon.txt ${KAG_DIR}
cp ${LEXICON_DIR}/dict/lexiconp.txt ${KAG_DIR}

cp ${LEXICON_DIR}/tmp/lexiconp_disambig.txt ${KAG_DIR}
cp ${LEXICON_DIR}/tmp/lexiconp_disambig.txt ${KAG_DIR}/lexiconp_disambig.base.txt # Same as previous file

touch user_lexicon.txt # Initially empty

Finally, the dictation graph is compiled with the following command:

python3 -m kaldi_active_grammar compile_agf_dictation_graph -m kag_model/ -v

In this way, I managed to create a custom KAG model for French. I hope it can be of any help...

In any case, once convert_generic_model_to_agf() is finished, I am sure the procedure will be much easier.

Lucía