Adjusting for other languages

BakerBunker / SALT

[ASRU 2023] Code of paper SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation

https://bakerbunker.github.io/SALT/

15 stars 1 forks source link

Adjusting for other languages #1

Open AnnCod opened 2 months ago

AnnCod commented 2 months ago

Hi,

Do you think that this solution can be adapted easily to work on different languages than English?

BakerBunker commented 2 months ago

Hi, @AnnCod , I think it can work on non-English languages. We tested this solution on Chinese speeches before, and we got a good result, though the audio quality is not good. I supposed it could because:

The feature extractor: WavLM was trained on English corpus, and it is out of distribution when process non-English speech
The vocoder: HiFiGAN was trained on English corpus, causing the OOD issue too

If you want to get a decent audio quality, you may try to use pretrained models trained on multilingual corpus like XLS-R and then train a vocoder with your target language.

AnnCod commented 2 months ago

Thanks for the reply. Is this demo working correctly? I have some errors while trying to run it on colab.

BakerBunker commented 2 months ago

Sorry, I accidentally misspelled a variable name, fixed by https://github.com/BakerBunker/SALT/commit/8060405da51996c0b8b47a5b8c2babad0838b14a

AnnCod commented 2 months ago

but there's still an error "RuntimeError: The size of tensor a (23866) must match the size of tensor b (214) at non-singleton dimension 2"

BakerBunker commented 2 months ago

I can't reproduce this error, would you consider share your colab notebook with these output?