Takaaki-Saeki / ssl_speech_restoration

SelfRemaster: SSL Speech Restoration
MIT License
81 stars 8 forks source link

quality of restored speech not good #1

Closed sciai-ai closed 2 years ago

sciai-ai commented 2 years ago

Hi

I tried the Hugging face demo on my wav file but the quality is not good. Is it because the vocoder is trained on Japanese corpus. Is there a general speech restoration model?

Takaaki-Saeki commented 2 years ago

Hi,

Thank you for your interest. Hugging face demo is about "audio effect transfer", not about "speech restoration". Our method can also be used as an effector to add acoustic distortion feature extracted from historical audio to arbitrary clean audio. In the Huggingface demo, the distortion feature extracted from Japanese old recordings are added to your wavfile. So it is normal that the quality of the output wavfile is not good.

This repo does not provide a general speech restoration model, because currently our method focuses on learning speech restoration models in a data-dependent manner. That is the key difference compared with the previous general speech restoration method.

For more detailed information, please refer to our paper.

Connum commented 1 year ago

In my opinion, you are showcasing the less important feature of your approach. More people will be interested in speech restoration than transfer of noise distortions. You are even leading your paper with a direct comparison to the voicefixer paper, and this advantage is repeated throughout several paragraphs. Why not show what your real strength is (not only with a few examples on the demo page)?

To me it's not even clear if this is even possible with the current code, or how to get started to get this running for speech restoration. An easy to understand explanation would be quite helpful and appreciated!

Thanks for your great work!