DigitalPhonetics / speaker-anonymization

Speaker anonymization pipeline for hiding the identity of the speaker of a recording by changing the voice in it.
GNU General Public License v3.0
60 stars 4 forks source link

Anonymization using ecapa vectors #4

Open zhao1025 opened 11 months ago

zhao1025 commented 11 months ago

Hello, I would like to use ecapa vectors for anonymization. Can the gan.pt file you provided be directly used for the anonymization training of ecapa vectors? Or do you need to train a new gan.pt file separately? If retraining is required, could you please inform us of the training method? thank you.

SarinaMeyer commented 11 months ago

Hello,

Sorry for the late reply. The latest GAN (v2.0) does not generate ECAPA-TDNN vectors but a custom speaker style embedding based on Global Style Tokens. There is a previous version (v1.2) that was trained on concatenated ECAPA-TDNN and x-vector embeddings. Make sure to use the code in the gan_embeddings branch for this model. These concatenated training embeddings have 704 dimensions, of which the first 192 dimensions are the ECAPA-TDNN embedding and the last 512 dimensions are the x-vector of the training speaker. The output of the gan.pt are also 704-dimensional embeddings, however, I don't know if the GAN learned that the first 192 dimensions should resemble ECAPA-TDNN. You could try it, generate a speaker embedding, extract the first 192 dimensions, and treat this vector as ECAPA-TDNN embedding. Let me know if you do this and whether it works.

The repository currently does not contain the code for training the GAN. I plan to add it soon and will inform you about it once this is done.

zhao1025 commented 11 months ago

Okay, if I make this attempt, I will inform you of the results. Thank you for your reply!