Training Gigispeech problem in Kaldi

YangangCao commented 3 months ago

Hi dear author,

I only want to train a small acoustics model use Gigaspeech, but I encountered some problems when I run Gigaspeech recipe in Kaldi.

.if [ $stage -le 2 ]; then echo "======Train lm START | current time : date +%Y-%m-%d-%T==============" mkdir -p $lm_dir || exit 1; sed 's|\t| |' data/$train_combined/text |\ cut -d " " -f 2- > $lm_dir/corpus.txt || exit 1; echo "break point1" local/lm/train_lm.sh \ --cmd "$train_cmd" --lm-order $lm_order \ $lm_dir/corpus.txt $lm_dir || exit 1; echo "break point2" echo "======Train lm END | current time : date +%Y-%m-%d-%T================" fi

this step let me install SRILM and train a language model(when I train librispeech, I didn't do these two things), is it necessary?(I only want to train a acoustics model and don't need compute wer), whatever, I skip this step

Thanks very much!

nshmyrev commented 3 months ago

You can skip this step.

Still, it is recommended to install SRILM and evaluate the model, it is an important part of accuracy testing.

Next, you probably want to take some modern model instead of gigaspeech, there are many of them and they depend on your requirements. They gonna be much more accurate.

YangangCao commented 3 months ago

Hi dear author, thanks for your reply, my goal is to train a text limited ASR model, I only know chain model support it, any other more accurate method?

nshmyrev commented 3 months ago

Modern RNNT / conformer CTC model should be more accurate

alphacep / vosk-api

Training Gigispeech problem in Kaldi #1620