daanzu / kaldi-active-grammar

Python Kaldi speech recognition with grammars that can be set active/inactive dynamically at decode-time
GNU Affero General Public License v3.0
332 stars 49 forks source link

Exception: Command exited with status 1: steps/nnet3/get_egs.sh #56

Open Asma-droid opened 3 years ago

Asma-droid commented 3 years ago

hello,

I am a beginner under Kaldi and I'am trying to finetune danzu model by mini-librispeech data (juste a simple try) to understand the process.

I have firsty prepared data, coputed MFCC and i have then used this script for finetuning https://github.com/kaldi-asr/kaldi/blob/master/egs/aishell2/s5/local/nnet3/tuning/finetune_tdnn_1a.sh

I have used https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.8.0/tree_sp.zip as it is the tree directory for the most recent models (i have used mainly the ali.x.gz files).

i have faced the below issue:


2021-05-30 14:27:20,165 [steps/nnet3/train_dnn.py:36 - - INFO ] Starting DNN trainer (train_dnn.py) steps/nnet3/train_dnn.py --stage=-10 --cmd=run.pl --mem 4G --feat.cmvn-opts=--norm-means=false --norm-vars=false --trainer.input-model exp/nnet3/tdnn_sp_train/input.raw --trainer.num-epochs 5 --trainer.optimization.num-jobs-initial 1 --trainer.optimization.num-jobs-final 1 --trainer.optimization.initial-effective-lrate 0.0005 --trainer.optimization.final-effective-lrate 0.00002 --trainer.optimization.minibatch-size 1024 --feat-dir data/train_hires --lang data/lang --ali-dir exp/train_ali --dir exp/nnet3/tdnn_sp_train ['steps/nnet3/train_dnn.py', '--stage=-10', '--cmd=run.pl --mem 4G', '--feat.cmvn-opts=--norm-means=false --norm-vars=false', '--trainer.input-model', 'exp/nnet3/tdnn_sp_train/input.raw', '--trainer.num-epochs', '5', '--trainer.optimization.num-jobs-initial', '1', '--trainer.optimization.num-jobs-final', '1', '--trainer.optimization.initial-effective-lrate', '0.0005', '--trainer.optimization.final-effective-lrate', '0.00002', '--trainer.optimization.minibatch-size', '1024', '--feat-dir', 'data/train_hires', '--lang', 'data/lang', '--ali-dir', 'exp/train_ali', '--dir', 'exp/nnet3/tdnn_sp_train'] 2021-05-30 14:27:20,172 [steps/nnet3/train_dnn.py:178 - train - INFO ] Arguments for the experiment {'ali_dir': 'exp/train_ali', 'backstitch_training_interval': 1, 'backstitch_training_scale': 0.0, 'cleanup': True, 'cmvn_opts': '--norm-means=false --norm-vars=false', 'combine_sum_to_one_penalty': 0.0, 'command': 'run.pl --mem 4G', 'compute_per_dim_accuracy': False, 'dir': 'exp/nnet3/tdnn_sp_train', 'do_final_combination': True, 'dropout_schedule': None, 'egs_command': None, 'egs_dir': None, 'egs_opts': None, 'egs_stage': 0, 'email': None, 'exit_stage': None, 'feat_dir': 'data/train_hires', 'final_effective_lrate': 2e-05, 'frames_per_eg': 8, 'initial_effective_lrate': 0.0005, 'input_model': 'exp/nnet3/tdnn_sp_train/input.raw', 'lang': 'data/lang', 'max_lda_jobs': 10, 'max_models_combine': 20, 'max_objective_evaluations': 30, 'max_param_change': 2.0, 'minibatch_size': '1024', 'momentum': 0.0, 'num_epochs': 5.0, 'num_jobs_compute_prior': 10, 'num_jobs_final': 1, 'num_jobs_initial': 1, 'num_jobs_step': 1, 'online_ivector_dir': None, 'preserve_model_interval': 100, 'presoftmax_prior_scale_power': -0.25, 'prior_subset_size': 20000, 'proportional_shrink': 0.0, 'rand_prune': 4.0, 'remove_egs': True, 'reporting_interval': 0.1, 'samples_per_iter': 400000, 'shuffle_buffer_size': 5000, 'srand': 0, 'stage': -10, 'train_opts': [], 'use_gpu': 'yes'} nnet3-info exp/nnet3/tdnn_sp_train/input.raw 2021-05-30 14:27:20,373 [steps/nnet3/train_dnn.py:238 - train - INFO ] Generating egs steps/nnet3/get_egs.sh --cmd run.pl --mem 4G --cmvn-opts --norm-means=false --norm-vars=false --online-ivector-dir --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs steps/nnet3/get_egs.sh: creating egs. To ensure they are not deleted later you can do: touch exp/nnet3/tdnn_sp_train/egs/.nodelete steps/nnet3/get_egs.sh: feature type is raw, with 'apply-cmvn' steps/nnet3/get_egs.sh: working out number of frames of training data steps/nnet3/get_egs.sh: working out feature dim steps/nnet3/get_egs.sh: warning: the --frames-per-eg is too large to generate one archive with as many as --samples-per-iter egs in it. Consider reducing --frames-per-eg. steps/nnet3/get_egs.sh: creating 1 archives, each with 238983 egs, with steps/nnet3/get_egs.sh: 8 labels per example, and (left,right) context = (34,34) steps/nnet3/get_egs.sh: copying data alignments copy-int-vector ark:- ark,scp:exp/nnet3/tdnn_sp_train/egs/ali.ark,exp/nnet3/tdnn_sp_train/egs/ali.scp ERROR (copy-int-vector[5.5.929~1539-9bca2]:ReadBasicType():base/io-funcs-inl.h:68) ReadBasicType: did not get expected integer type, 0 vs. 4. You can change this code to successfully read it later, if needed.

[ Stack-Trace: ] /opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f27463671c3] copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x5650dbd339dd] copy-int-vector(kaldi::BasicVectorHolder::Read(std::istream&)+0xba9) [0x5650dbd3adcb] copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::Next()+0xf3) [0x5650dbd3b159] copy-int-vector(main+0x484) [0x5650dbd3320d] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f2745f5f0b3] copy-int-vector(_start+0x2e) [0x5650dbd32cce]

WARNING (copy-int-vector[5.5.929~1539-9bca2]:Read():util/kaldi-holder-inl.h:308) BasicVectorHolder::Read, read error or unexpected data at archive entry beginning at file position 18446744073709551615 WARNING (copy-int-vector[5.5.929~1539-9bca2]:Next():util/kaldi-table-inl.h:574) Object read failed, reading archive standard input LOG (copy-int-vector[5.5.929~1539-9bca2]:main():copy-int-vector.cc:83) Copied 2697018 vectors of int32. ERROR (copy-int-vector[5.5.929~1539-9bca2]:~SequentialTableReaderArchiveImpl():util/kaldi-table-inl.h:678) TableReader: error detected closing archive standard input

[ Stack-Trace: ] /opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f27463671c3] copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x5650dbd339dd] copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0x121) [0x5650dbd3734d] copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0xd) [0x5650dbd3764d] copy-int-vector(kaldi::SequentialTableReader<kaldi::BasicVectorHolder >::~SequentialTableReader()+0x16) [0x5650dbd37816] copy-int-vector(main+0x520) [0x5650dbd332a9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f2745f5f0b3] copy-int-vector(_start+0x2e) [0x5650dbd32cce]

terminate called after throwing an instance of 'kaldi::KaldiFatalError' what(): kaldi::KaldiFatalError

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe steps/nnet3/get_egs.sh: line 272: 1101381 Exit 1 for id in $(seq $num_ali_jobs); do gunzip -c $alidir/ali.$id.gz; done 1101382 Aborted (core dumped) | copy-int-vector ark:- ark,scp:$dir/ali.ark,$dir/ali.scp Traceback (most recent call last): File "steps/nnet3/train_dnn.py", line 459, in main train(args, run_opts) File "steps/nnet3/train_dnn.py", line 253, in train stage=args.egs_stage) File "steps/libs/nnet3/train/frame_level_objf/acoustic_model.py", line 61, in generate_egs egs_opts=egs_opts if egs_opts is not None else '')) File "steps/libs/common.py", line 129, in execute_command p.returncode, command)) Exception: Command exited with status 1: steps/nnet3/get_egs.sh --cmd "run.pl --mem 4G" --cmvn-opts "--norm-means=false --norm-vars=false" --online-ivector-dir "" --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs

(GPU) ~/scratch/asma-kaldi/egs/aishell2/s5$ bash local/nnet3/tuning/finetune_tdnn_1a.sh 2021-05-30 14:31:18,075 [steps/nnet3/train_dnn.py:36 - - INFO ] Starting DNN trainer (train_dnn.py) steps/nnet3/train_dnn.py --stage=-10 --cmd=run.pl --mem 4G --feat.cmvn-opts=--norm-means=false --norm-vars=false --trainer.input-model exp/nnet3/tdnn_sp_train/input.raw --trainer.num-epochs 5 --trainer.optimization.num-jobs-initial 1 --trainer.optimization.num-jobs-final 1 --trainer.optimization.initial-effective-lrate 0.0005 --trainer.optimization.final-effective-lrate 0.00002 --trainer.optimization.minibatch-size 1024 --feat-dir data/train_hires --lang data/lang --ali-dir exp/train_ali --dir exp/nnet3/tdnn_sp_train ['steps/nnet3/train_dnn.py', '--stage=-10', '--cmd=run.pl --mem 4G', '--feat.cmvn-opts=--norm-means=false --norm-vars=false', '--trainer.input-model', 'exp/nnet3/tdnn_sp_train/input.raw', '--trainer.num-epochs', '5', '--trainer.optimization.num-jobs-initial', '1', '--trainer.optimization.num-jobs-final', '1', '--trainer.optimization.initial-effective-lrate', '0.0005', '--trainer.optimization.final-effective-lrate', '0.00002', '--trainer.optimization.minibatch-size', '1024', '--feat-dir', 'data/train_hires', '--lang', 'data/lang', '--ali-dir', 'exp/train_ali', '--dir', 'exp/nnet3/tdnn_sp_train'] 2021-05-30 14:31:18,082 [steps/nnet3/train_dnn.py:178 - train - INFO ] Arguments for the experiment {'ali_dir': 'exp/train_ali', 'backstitch_training_interval': 1, 'backstitch_training_scale': 0.0, 'cleanup': True, 'cmvn_opts': '--norm-means=false --norm-vars=false', 'combine_sum_to_one_penalty': 0.0, 'command': 'run.pl --mem 4G', 'compute_per_dim_accuracy': False, 'dir': 'exp/nnet3/tdnn_sp_train', 'do_final_combination': True, 'dropout_schedule': None, 'egs_command': None, 'egs_dir': None, 'egs_opts': None, 'egs_stage': 0, 'email': None, 'exit_stage': None, 'feat_dir': 'data/train_hires', 'final_effective_lrate': 2e-05, 'frames_per_eg': 8, 'initial_effective_lrate': 0.0005, 'input_model': 'exp/nnet3/tdnn_sp_train/input.raw', 'lang': 'data/lang', 'max_lda_jobs': 10, 'max_models_combine': 20, 'max_objective_evaluations': 30, 'max_param_change': 2.0, 'minibatch_size': '1024', 'momentum': 0.0, 'num_epochs': 5.0, 'num_jobs_compute_prior': 10, 'num_jobs_final': 1, 'num_jobs_initial': 1, 'num_jobs_step': 1, 'online_ivector_dir': None, 'preserve_model_interval': 100, 'presoftmax_prior_scale_power': -0.25, 'prior_subset_size': 20000, 'proportional_shrink': 0.0, 'rand_prune': 4.0, 'remove_egs': True, 'reporting_interval': 0.1, 'samples_per_iter': 400000, 'shuffle_buffer_size': 5000, 'srand': 0, 'stage': -10, 'train_opts': [], 'use_gpu': 'yes'} nnet3-info exp/nnet3/tdnn_sp_train/input.raw 2021-05-30 14:31:18,286 [steps/nnet3/train_dnn.py:238 - train - INFO ] Generating egs steps/nnet3/get_egs.sh --cmd run.pl --mem 4G --cmvn-opts --norm-means=false --norm-vars=false --online-ivector-dir --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs steps/nnet3/get_egs.sh: creating egs. To ensure they are not deleted later you can do: touch exp/nnet3/tdnn_sp_train/egs/.nodelete steps/nnet3/get_egs.sh: feature type is raw, with 'apply-cmvn' steps/nnet3/get_egs.sh: working out number of frames of training data steps/nnet3/get_egs.sh: working out feature dim steps/nnet3/get_egs.sh: warning: the --frames-per-eg is too large to generate one archive with as many as --samples-per-iter egs in it. Consider reducing --frames-per-eg. steps/nnet3/get_egs.sh: creating 1 archives, each with 238983 egs, with steps/nnet3/get_egs.sh: 8 labels per example, and (left,right) context = (34,34) steps/nnet3/get_egs.sh: copying data alignments copy-int-vector ark:- ark,scp:exp/nnet3/tdnn_sp_train/egs/ali.ark,exp/nnet3/tdnn_sp_train/egs/ali.scp ERROR (copy-int-vector[5.5.929~1539-9bca2]:ReadBasicType():base/io-funcs-inl.h:68) ReadBasicType: did not get expected integer type, 0 vs. 4. You can change this code to successfully read it later, if needed.

[ Stack-Trace: ] /opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f3acf1181c3] copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x55f3b30bd9dd] copy-int-vector(kaldi::BasicVectorHolder::Read(std::istream&)+0xba9) [0x55f3b30c4dcb] copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::Next()+0xf3) [0x55f3b30c5159] copy-int-vector(main+0x484) [0x55f3b30bd20d] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f3aced100b3] copy-int-vector(_start+0x2e) [0x55f3b30bccce]

WARNING (copy-int-vector[5.5.929~1539-9bca2]:Read():util/kaldi-holder-inl.h:308) BasicVectorHolder::Read, read error or unexpected data at archive entry beginning at file position 18446744073709551615 WARNING (copy-int-vector[5.5.929~1539-9bca2]:Next():util/kaldi-table-inl.h:574) Object read failed, reading archive standard input LOG (copy-int-vector[5.5.929~1539-9bca2]:main():copy-int-vector.cc:83) Copied 2697018 vectors of int32. ERROR (copy-int-vector[5.5.929~1539-9bca2]:~SequentialTableReaderArchiveImpl():util/kaldi-table-inl.h:678) TableReader: error detected closing archive standard input

[ Stack-Trace: ] /opt/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0x793) [0x7f3acf1181c3] copy-int-vector(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x25) [0x55f3b30bd9dd] copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0x121) [0x55f3b30c134d] copy-int-vector(kaldi::SequentialTableReaderArchiveImpl<kaldi::BasicVectorHolder >::~SequentialTableReaderArchiveImpl()+0xd) [0x55f3b30c164d] copy-int-vector(kaldi::SequentialTableReader<kaldi::BasicVectorHolder >::~SequentialTableReader()+0x16) [0x55f3b30c1816] copy-int-vector(main+0x520) [0x55f3b30bd2a9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f3aced100b3] copy-int-vector(_start+0x2e) [0x55f3b30bccce]

terminate called after throwing an instance of 'kaldi::KaldiFatalError' what(): kaldi::KaldiFatalError

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe

gzip: stdout: Broken pipe steps/nnet3/get_egs.sh: line 272: 1101630 Exit 1 for id in $(seq $num_ali_jobs); do gunzip -c $alidir/ali.$id.gz; done 1101631 Aborted (core dumped) | copy-int-vector ark:- ark,scp:$dir/ali.ark,$dir/ali.scp Traceback (most recent call last): File "steps/nnet3/train_dnn.py", line 459, in main train(args, run_opts) File "steps/nnet3/train_dnn.py", line 253, in train stage=args.egs_stage) File "steps/libs/nnet3/train/frame_level_objf/acoustic_model.py", line 61, in generate_egs egs_opts=egs_opts if egs_opts is not None else '')) File "steps/libs/common.py", line 129, in execute_command p.returncode, command)) Exception: Command exited with status 1: steps/nnet3/get_egs.sh --cmd "run.pl --mem 4G" --cmvn-opts "--norm-means=false --norm-vars=false" --online-ivector-dir "" --left-context 34 --right-context 34 --left-context-initial -1 --right-context-final -1 --stage 0 --samples-per-iter 400000 --frames-per-eg 8 --srand 0 data/train_hires exp/train_ali exp/nnet3/tdnn_sp_train/egs

Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group

You received this message because you are subscribed to the Google Groups "kaldi-help" group. To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/60e0ba04-247d-4a26-93f8-8bdd106ca987n%40googlegroups.com.


Any idea please ( for information, i am using two GPUs with 25 GB each )?