output quality - Githubissues

KimythAnly / AGAIN-VC

This is the official implementation of the paper AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization.

https://kimythanly.github.io/AGAIN-VC-demo/index

MIT License

111 stars 19 forks source link

output quality #3

Closed ak9250 closed 3 years ago

ak9250 commented 3 years ago

I tried the first sample here as input https://speechresearch.github.io/hifisinger/ and the output sounds like this, it this expected? https://soundcloud.com/user-426165954/7000000184-to-p226-001

KimythAnly commented 3 years ago

Yes, this is expected.

The speaker is out-of-domain (the F0 is higher than all training data). I think the result would be better if the F0 is under some threshold related to our training data (eg, a male singer).
Most of the training data (VCTK) does not change the pitch that much. I think the encoder might fail to extract the speaker properly due to the change of the pitch.

ak9250 commented 3 years ago

ok thanks

ak9250 commented 3 years ago

@KimythAnly what are some ways this could be improved for singing synthesis in particular going from a singer source identity to a target speaker identity?

KimythAnly commented 3 years ago

Hmm If you have some singing corpus, then you can just train a model using that data. Also, use f0 as an additional feature is useful. As far as I know, many singing VC systems use this feature.

ak9250 commented 3 years ago

@KimythAnly ok thanks the demo page did show going from singing to speaker to singing but the quality is a bit degraded I will also look into this approach https://nobody996.github.io/FastSVC/