Issue when running the pipeline for anonymization using x-vectors and neural waveform models

ArneDefauw commented 3 years ago

After what seems like a successful installation of the software using the ./install.sh script. I encountered an error running the ./run.sh script:

_Stage a.1: Generating pseudo-speakers for libri_dev_enrolls.[0m Computing PLDA affinity scores of each source speaker to each pool speaker. cut: exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libritts_train_other_500/spk_xvector.scp: No such file or directory bash: line 1: 8013 Aborted (core dumped) ( ivector-plda-scoring --normalize-length=true "ivector-copy-plda --smoothing=0.0 exp/models/2_xvect_extr/exp/xvector_nnet_1a/plda - |" "ark:ivector-subtract-global-mean exp/models/2_xvect_extr/exp/xvector_nnet_1a/mean.vec scp:exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libri_dev_enrolls/spk_xvector.scp ark:- | transform-vec exp/models/2_xvect_extr/exp/xvector_nnet_1a/transform.mat ark:- ark:- | ivector-normalize-length ark:- ark:- |" "ark:ivector-subtract-global-mean exp/models/2_xvect_extr/exp/xvector_nnet_1a/mean.vec scp:exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libritts_train_other_500/spk_xvector.scp ark:- | transform-vec exp/models/2_xvect_extr/exp/xvector_nnet_1a/transform.mat ark:- ark:- | ivector-normalize-length ark:- ark:- |" "cat 'exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libri_dev_enrolls/fake_trials/trial_1272' | cut -d\ --fields=1,2 |" exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libri_dev_enrolls/spk_pool_scores/affinity_1272 ) 2>> exp/scores/log/libritts_pool_scoring.log >> exp/scores/log/libritts_pool_scoring.log run.pl: job failed, log is in exp/scores/log/libritts_pool_scoring.log ['local/anon/gen_pseudo_xvecs.py', 'data/libri_dev_enrolls', 'data/libritts_train_other_500', 'exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libri_dev_enrolls/spk_pool_scores', 'exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon', 'exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libri_dev_enrolls/pseudo_xvecs', 'spk', 'false', 'farthest', '0'] Same gender speakers will be selected. Randomization level: spk Proximity: farthest Reading source spk2gender. Reading source spk2utt. Reading pool spk2gender. Reading pool xvectors. Traceback (most recent call last): File "local/anon/gen_pseudo_xvecs.py", line 85, in for key, xvec in reader: File ".../Voice-Privacy-Challenge-2020/venv/lib/python3.8/site-packages/kaldiio/highlevel.py", line 128, in iter k, v = next(self.generator) File ".../Voice-Privacy-Challenge-2020/venv/lib/python3.8/site-packages/kaldiio/matio.py", line 78, in load_scp_sequential with open_like_kaldi(fname, 'r') as fd: File ".../Voice-Privacy-Challenge-2020/venv/lib/python3.8/site-packages/kaldiio/utils.py", line 205, in open_like_kaldi return io.open(name, mode, encoding=encoding) FileNotFoundError: [Errno 2] No such file or directory: 'exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libritts_train_other_500/spkxvector.scp'

apparently the spk_xvector.scp file could not be created/found.

The full log file can be found here https://drive.google.com/file/d/1fMagP7K-6YOieSFpvPVn8dTr8x7fLZli/view?usp=sharing

the log file generated by Kaldi exp/scores/log/libritts_pool_scoring.log can be found here:

https://drive.google.com/file/d/1TmMzlBY-P9pZ8SuOzcjyeEXKV5Krkdjh/view?usp=sharing

Natalia-T commented 3 years ago

Hi @ArneDefauw,

It seems that in Stage 7, running /sid/nnet3/xvector/extract_xvectors.sh was not completed.

In your log:

sid/nnet3/xvector/extract_xvectors.sh: extracting xvectors for data/libritts_train_other_500
sid/nnet3/xvector/extract_xvectors.sh: extracting xvectors from nnet
Stage 8: Making evaluation subsets...

This corresponds to Stage 0 in extract_xvectors.sh. However, in extract_xvectors.sh there are two other stages (1 and 2) that are not seen in your log file.

Could you please attach the log of extract_xvectors.sh for (Stage 7, exp/models/2_xvect_extr/exp/xvector_nnet_1a, data/libritts_train_other_500)?

ArneDefauw commented 3 years ago

Hi @Natalia-T , thanks for the swift reply. I ran the /sid/nnet3/xvector/extract_xvectors.sh script as standalone, and now it is going through all stages https://drive.google.com/file/d/1fCs1jSvLVg224l0HM0pddPqntuTIlXNe/view?usp=sharing .

However now it is failing in another stage, due to problems with cuda: https://drive.google.com/file/d/1Cx_pLtAEj4Knroin32IuePj5JewokYzE/view?usp=sharing https://drive.google.com/file/d/1Ho26Nwz3BUYwsjk4lMBJGPR9LaJsFJWj/view?usp=sharing

Is it necessary to run the code on GPU?

Natalia-T commented 3 years ago

Hi @ArneDefauw,

The program fails on the PPG (BN) feature extraction because by default: use_gpu=yes in

https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2020/blob/1dcdcdae620fe2def0f35a203b598c42a5c8df1d/baseline/local/featex/extract_bn.sh#L14

This stage can be performed on CPU. To do this you should provide the corresponding value into the call of extract_bn.sh: https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2020/blob/1dcdcdae620fe2def0f35a203b598c42a5c8df1d/baseline/local/featex/extract_ppg.sh#L50

by specifying: --use_gpu no.

However, for some other (later) stages (i.e. TTS part), GPU is necessary.

Voice-Privacy-Challenge / Voice-Privacy-Challenge-2020

Issue when running the pipeline for anonymization using x-vectors and neural waveform models #9