Voice Conversion Results

dhchoi99 / NANSY

165 stars 20 forks source link

Voice Conversion Results #2

Open PiotrDabkowski opened 2 years ago

PiotrDabkowski commented 2 years ago

Hey, really nice work!

I also have my own private NANSY implementation - it seems to work, at least the reconstruction is solid, but the voice conversion results were pretty poor, worse than the ones in the original paper samples (not sure whether they were cherrypicked). The speaker similarity was not that good, and I achieved better results using a different method.

Do you have some Voice Conversion samples?

dhchoi99 commented 2 years ago

Thanks for sharing your experience!

I'm also suffering from poor voice conversion result and trying to figure it out. The results sound quite reasonable, but the quality was not as the authors'. I'll share here some samples that I regarded as bad.

sample.zip

PiotrDabkowski commented 2 years ago

For me the results in the paper are a bit weird, tbh, I was able to get a high quality restoration just based on the perturbed layer 12, this means that there is still a significant identity leakage through the "Linguistic" layer.

dhchoi99 commented 2 years ago

One similar but different issue I felt was speaker identity leakage from the pitch feature. I think might be from insufficient perturbation, but not sure since my implementation is slightly different from the paper.

PiotrDabkowski commented 2 years ago

For Yingram I also used the fft approach but it would be better to write a custom kernel instead, otherwise the windowing is a bit broken...

JeromeNi commented 2 years ago

Thanks for sharing your experience!

I'm also suffering from poor voice conversion result and trying to figure it out. The results sound quite reasonable, but the quality was not as the authors'. I'll share here some samples that I regarded as bad.

sample.zip

Hi, first thanks for the great implementation!

Compared to the results you shared here in sample.zip, have you been able to improve the synthesis quality after fixing the issue here (https://github.com/dhchoi99/NANSY/issues/3)?

dhchoi99 commented 2 years ago

@JeromeNi Fixing that issue didn't give much improvement to voice conversion quality. Training longer than the paper helped improving quality, but still having lower quality than the paper.