Voice-Privacy-Challenge / Voice-Privacy-Challenge-2022

Baseline Recipe for VoicePrivacy Challenge 2022: anonymization systems and evaluation software
64 stars 15 forks source link

Error: data/train-clean-360_anon_sp/feats.scp already exists #18

Open suhitaghosh10 opened 2 years ago

suhitaghosh10 commented 2 years ago

steps/diagnostic/analyze_alignments.sh --cmd run.pl data/lang exp/tri3b_cleaned steps/diagnostic/analyze_alignments.sh: see stats in exp/tri3b_cleaned/log/analyze_alignments.log 1 warnings in exp/tri3b_cleaned/log/build_tree.log 27 warnings in exp/tri3b_cleaned/log/acc...log 8 warnings in exp/tri3b_cleaned/log/update..log 33 warnings in exp/tri3b_cleaned/log/align...log 20 warnings in exp/tri3b_cleaned/log/convert..log 9 warnings in exp/tri3b_cleaned/log/fmllr...log steps/train_sat.sh: Likelihood evolution: -55.2576 -52.3376 -52.1623 -51.612 -50.1089 -48.6247 -47.4872 -46.7146 -46.154 -45.526 -45.1648 -44.7545 -44.4738 -44.2664 -44.0799 -43.9154 -43.7681 -43.6342 -43.512 -43.3358 -43.2092 -43.1134 -43.0245 -42.9414 -4 2.8652 -42.793 -42.7238 -42.6577 -42.5951 -42.5037 -42.44 -42.4085 -42.3882 -42.3738 exp/tri3b_cleaned: nj=10 align prob=-45.15 over 355.80h [retry=0.0%, fail=0.0%] states=5952 gauss=150145 fmllr-impr=0.71 over 293.24h tree-impr=9.36 steps/train_sat.sh: done training SAT system in exp/tri3b_cleaned local/chain/run_tdnn_1d__360.sh local/nnet3/run_ivector_common.sh: preparing directory for low-resolution speed-perturbed data (for alignment) utils/data/perturb_data_dir_speed_3way.sh: data/train-clean-360_anon_sp/feats.scp already exists: refusing to run this (please delete data/train-clean-360_anon_sp/feats.scp if you want this to run)

SarinaMeyer commented 2 years ago

I have a similar problem, each time I rerun the evaluation, the run terminates with an error because some files already exist. Could you please provide an update to the cleanup.sh that includes the new files of this challenge?

Natalia-T commented 2 years ago

Kaldi-based ASR AM models and corresponding Kaldi scripts are used for ASR evaluation. The ASR AM training scripts comprised multiple stages of training, and in some of them an additional verification is implemented to avoid repeating of already completed processes.

For example, in your case:

  1. https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022/blob/5a8c9f90af2fda729b573ceb1c1f690ed0ea1c1e/baseline/local/nnet3/run_ivector_common.sh#L47
  2. https://github.com/kaldi-asr/kaldi/blob/d673298886e8d62d4c890e5e3eac8491df0b7e12/egs/wsj/s5/utils/data/perturb_data_dir_speed_3way.sh#L52

So, you can more precisely specify the stage from which you want to resume your training or remove a corresponding file as suggested in the Kaldi script.

cleanup.sh was originally designed for (re-)running ASR/ASV evaluation stages only (with already trained ASR/ASV evaluation models), and could be updated correspondingly for the new setup. However, it is not related to training of ASR/ASV models because each of these processes has multiple (sub)stages and the logic which data to remove will depend on the completed (sub)stages and is not so straightforward (requires user's supervision).

suhitaghosh10 commented 2 years ago

Thanks for the detailed answer. But, when I am running for the first time, shouldn't it run without such errors?

Natalia-T commented 2 years ago

But, when I am running for the first time, shouldn't it run without such errors?

Yes, for the first time you should not get such errors.

egaznep commented 2 years ago

I have also been experiencing this issue, and after numerous attempts, managed to get a full execution without any 'refusing to run' errors. I created a shell script in the baseline folder and pasted the following into it:

rm -rf data/train-clean-360_anon_sp/feats.scp
rm -rf data/train-clean-360_anon_sp_hires/feats.scp
rm -rf data/train-clean-360_anon_sp_hires_60k/feats.scp
rm -rf exp/tri3b_cleaned_ali_train-clean-360_anon_sp
rm -rf exp/models/user_asr_eval_anon/chain_cleaned/tree_sp/final.mdl

and I am running this each time I'd like to re-run the baseline.