alibaba / Alibaba-MIT-Speech

Alibaba speech technology
915 stars 250 forks source link

Error: in data_fbank/train_960_cleaned, recording-ids extracted from wav.scp and reco2dur file differ #13

Closed sunnwmy closed 5 years ago

sunnwmy commented 5 years ago

I have run all the procedures in run.sh for several days and finally got 'train_960_cleaned' for training the deep fsmn. But when I start training deep fsmn by running 'local/nnet/run_fsmn.sh DFSMN_S', it gives error:

`steps/online/nnet2/extract_ivectors_online.sh: done extracting (online) iVectors to exp/nnet3_cleaned/ivectors_dev_other_hires using the extractor in exp/nnet3_cleaned/extractor. steps/make_fbank.sh --nj 30 --cmd run.pl --fbank-config conf/fbank.conf data_fbank/train_960_cleaned exp/make_fbank/train_960_cleaned fbank/train_960_cleaned steps/make_fbank.sh: moving data_fbank/train_960_cleaned/feats.scp to data_fbank/train_960_cleaned/.backup utils/validate_data_dir.sh: Error: in data_fbank/train_960_cleaned, recording-ids extracted from wav.scp and reco2dur file utils/validate_data_dir.sh: differ, partial diff is: 1,301545c1,281081 < 100-121669-0000-1 < 100-121669-0001-1 < 100-121669-0002-1 < 100-121669-0003-1 < 100-121669-0004-1 ...

986-129388-0107 986-129388-0108 986-129388-0109 986-129388-0110 986-129388-0111 986-129388-0112 [Lengths are /tmp/kaldi.rudy/utts=301545 versus /tmp/kaldi.rudy/recordings.reco2dur=281081]`

It seems the number of records in file utts and file recordings.reco2dur is not the same, but validate_data_dir.sh expects them to be same. Does anyone know how to fix this? Any advice would be appreciated. Thanks!

sunnwmy commented 5 years ago

problem solved, just update the 'validate_data_dir.sh'. the old version has bugs dealing with the reco2dur file