DigitalPhonetics / speaker-anonymization

Speaker anonymization pipeline for hiding the identity of the speaker of a recording by changing the voice in it.
GNU General Public License v3.0
60 stars 4 forks source link

Affective/emotional information conservation? #3

Closed Petemir closed 1 year ago

Petemir commented 1 year ago

Hello!

I am interested in using your tool to anonymize audio coming from a multimodal dataset, but wanted to know if your method maintains audio properties that could be used for emotion speech recognition. I see a brief mention suggesting so in the "Prosody Is Not Identity" paper (end of Section 2.2), but no formal comparison if this actually occurs.

Have you tried, or are you planning to, run emotion recognition pipelines on the original and anonymized data to check if the performance degrades?

Thanks a lot for the great and interesting tool and work!

SarinaMeyer commented 1 year ago

Hi,

We have not checked yet how well the anonymization preserves or destroys properties that are needed for emotion recognition. We are planning to do this at some point but I cannot give you an estimate about when this will be at the moment.

If you want to use the anonymization in a speech recognition application, you should definitely use the latest model using prosody cloning. Theoretically, if prosody is the main property that carries emotional information, emotion recognition should still be possible after anonymization. You might get better results if you train or finetune your recognizer on anonymized or other synthesized data. If you happen to test how much the recognition performance is affected by anonymization, it would be great if you could share your results.

Petemir commented 1 year ago

Thank you for your reply! I will keep the tool in mind and let you know if I have progress with the analysis :). Cheers.