Using sidekit for computing ID vectors

carlosfranzreb commented 2 years ago

Hello,

For the challenge, we would like to compute ID vectors differently than in the baseline. There is a sidekit branch in the repository, where this should be possible. Could you please explain what is the best way of doing so?

Kind regards, Carlos

Natalia-T commented 2 years ago

Hi Carlos,

The recipe with the sidekit x-vector implementation normally should work in a similar way as the kaldi x-vector implementation.

you need to use the sidekit branch;
in config.sh setup parameter xvect_type=sidekit.
the corresponding pretrained TTS models are provided in the exp/models dir (please download the latest version of models.2022.tar.gz):

4_nsf_pt_sidekit

5_joint_tts_hifigan_sidekit

5_joint_tts_nsf_hifigan_sidekit

- please note, that as written in the evaluation plan, for official ranking, the x-vector extractors and corresponding TTS models should be trained without using additional data (that is not the case for the current models that are trained using data augmentation corpora).

@hnourtel and @pchampio, could you please provide further details about the sidekit branch and using sidekit x-vector extractor in the anonymization setup and ASV models w/o data augmentation?

carlosfranzreb commented 2 years ago

Hello Natalia,

Thanks for your reply! I have some follow-up questions to further clarify the approach:

What is the location of the file I have to replace to compute different ID vectors?
The different model I want to use outputs ID vectors that are smaller than x-vectors. Does the sidekit branch automatically adapt the dimensionality of the HiFi-GAN? If not, what should I modify?
The HiFi-GAN needs to be re-trained to account for the new ID vectors. Is there a script for doing so?

Kind regards, Carlos

hnourtel commented 2 years ago

Hi Carlos,

You have to replace 01_extract_xvectors.sh calls in the files where you want to use new ID vectors by your own extraction script. Files can be found there : https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022/search?q=01_extract_xvectors If you go on those files in sidekit branch, you can see the call to sidekit xvectors extraction script. This script is located there : https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022/blob/5ab1e4ab05295efb3927ad255ff54394fa5f08f3/baseline/local/featex/01_extract_xvectors_sidekit.sh
For the Hifi-GAN, the sidekit branch doesn't adapt automatically to ID vectors size. You have to change ID vectors size in this configuration file : https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022/blob/5ab1e4ab05295efb3927ad255ff54394fa5f08f3/nii_pytorch/projects/joint_tts_hifigan/config_sidekit.py#L52 The second 256 value is the xvector size, it's the value to be changed for adaptation to your ID vectors size. You have several config_sidekit.py files, depending on the TTS used (all TTS configurations available here : https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022/tree/5ab1e4ab05295efb3927ad255ff54394fa5f08f3/nii_pytorch/projects)
Script for training TTS models is here : https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022/blob/5ab1e4ab05295efb3927ad255ff54394fa5f08f3/baseline/local/train_tts_model.sh You can change the model_type parameter to change the TTS model trained.

I hope it answers your questions.

Kind regards, Hubert

carlosfranzreb commented 2 years ago

Hello Hubert,

This is very helpful, thank you! I will give it a try right away.

Kind regards, Carlos

Voice-Privacy-Challenge / Voice-Privacy-Challenge-2022

Using sidekit for computing ID vectors #27