Closed carlosfranzreb closed 2 years ago
Hi Carlos,
The recipe with the sidekit x-vector implementation normally should work in a similar way as the kaldi x-vector implementation.
xvect_type=sidekit
. exp/models
dir (please download the latest version of models.2022.tar.gz):
4_nsf_pt_sidekit
5_joint_tts_hifigan_sidekit
5_joint_tts_nsf_hifigan_sidekit
- please note, that as written in the evaluation plan, for official ranking, the x-vector extractors and corresponding TTS models should be trained without using additional data (that is not the case for the current models that are trained using data augmentation corpora).
@hnourtel and @pchampio, could you please provide further details about the sidekit branch and using sidekit x-vector extractor in the anonymization setup and ASV models w/o data augmentation?
Hello Natalia,
Thanks for your reply! I have some follow-up questions to further clarify the approach:
Kind regards, Carlos
Hi Carlos,
You have to replace 01_extract_xvectors.sh
calls in the files where you want to use new ID vectors by your own extraction script. Files can be found there : https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022/search?q=01_extract_xvectors
If you go on those files in sidekit branch, you can see the call to sidekit xvectors extraction script. This script is located there : https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022/blob/5ab1e4ab05295efb3927ad255ff54394fa5f08f3/baseline/local/featex/01_extract_xvectors_sidekit.sh
For the Hifi-GAN, the sidekit branch doesn't adapt automatically to ID vectors size. You have to change ID vectors size in this configuration file : https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022/blob/5ab1e4ab05295efb3927ad255ff54394fa5f08f3/nii_pytorch/projects/joint_tts_hifigan/config_sidekit.py#L52
The second 256 value is the xvector size, it's the value to be changed for adaptation to your ID vectors size. You have several config_sidekit.py
files, depending on the TTS used (all TTS configurations available here : https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022/tree/5ab1e4ab05295efb3927ad255ff54394fa5f08f3/nii_pytorch/projects)
Script for training TTS models is here : https://github.com/Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022/blob/5ab1e4ab05295efb3927ad255ff54394fa5f08f3/baseline/local/train_tts_model.sh
You can change the model_type
parameter to change the TTS model trained.
I hope it answers your questions.
Kind regards, Hubert
Hello Hubert,
This is very helpful, thank you! I will give it a try right away.
Kind regards, Carlos
Hello,
For the challenge, we would like to compute ID vectors differently than in the baseline. There is a sidekit branch in the repository, where this should be possible. Could you please explain what is the best way of doing so?
Kind regards, Carlos