OlaWod / FreeVC

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion
MIT License
602 stars 111 forks source link

WHAMR! artificial reverberation #48

Open Ashraf-Ali-aa opened 1 year ago

Ashraf-Ali-aa commented 1 year ago

I think a better way of extracting speak content might be to use WHAMR! to add artificial reverberation along with background noise to the training dataset in order for the WavLM to extract content, this would be useful since it can get extract the content that is not from studio grade audio. This is the same technique whisper AI uses for ASR

https://wham.whisper.ai/

OlaWod commented 1 year ago

yes it can make the model more robust to noise and reverberation.

steven850 commented 1 year ago

I don't see this doing much, if applied to the wavlm training then sure, but since the content model is locked and cant be trained I don't think this would have much of an effect on the VC model.

OlaWod commented 1 year ago

I don't see this doing much, if applied to the wavlm training then sure, but since the content model is locked and cant be trained I don't think this would have much of an effect on the VC model.

the content is obtained by wavlm and bottleneck extractor, so i think if trained with noisy speech, the bottleneck extractor will learn to extract clean content from noisy wavlm feature. noisy wav -> wavlm -> noisy ssl feature -> bottleneck extractor -> clean content. (and current: sr augmented wav -> wavlm -> ssl feature -> bottleneck extractor -> content. )