Vocoder Quality - Githubissues

CookiePPP / VocoderComparisons

Train/test a variety of open source vocoders using the same input features and dataset. Then infer together for easy side-by-side comparisons.

MIT License

6 stars 1 forks source link

Vocoder Quality #2

Closed Coice closed 2 years ago

Coice commented 2 years ago

Hello!

You seem to have done quite a bit of vocoder comparisons. I have two questions based on your own personal experience.

Which vocoder do you feel has the best overall quality (ignoring inference speed) when fine-tuned from mel's (such as from tacotron2)?
Does adding a speaker embedding improve overall synthesis quality when using a multi-speaker model?

Thank you for your time!

CookiePPP commented 2 years ago

@coice

Which vocoder do you feel has the best overall quality (ignoring inference speed)

If you do not care about inference speed then WaveGrad.

Does adding a speaker embedding improve overall synthesis quality when using a multi-speaker model?

I didn't find any clear improvements from adding a speaker embedding to the vocoder, but many of my speakers have very little data so you may find different results. (and the UMAP projection showed that speakers were clustered around the microphone/recording environment used, so the embedding is definitely used by the vocoder to at least some extent)

Coice commented 2 years ago

@CookiePPP

Thanks for responding!

I have personally been using melgan/hifigan in most of my experiments, but quality is still much lower than desired (evaluated using teacher-forced mels). I will try WaveGrad and compare.

Have you tried Fre-GAN? They report near ground truth quality. My fregan results with fine tuned model mels sounded metallic, on real mels the quality was great, better I would say than hifigan. I might revisit that as well and double check my params.

https://arxiv.org/pdf/2106.02297.pdf

CookiePPP commented 2 years ago

Have you tried Fre-GAN?

No.

better I would say than hifigan

My best HiFi-GAN was almost indistinguishable from ground truth so I haven't spent a lot of GPU time looking into alternatives.

Coice commented 2 years ago

@CookiePPP

Do you happen to have any audio samples you can share of your highest quality synthesis from your TTS engine?

Also do you know of any groups, discord, etc, for discussing this subject?

Again, thanks for your time!

CookiePPP commented 2 years ago

Do you happen to have any audio samples you can share of your highest quality synthesis from your TTS engine?

Sorry, no. After 14 months I don't remember the exact location of ~~them~~ the audio samples I referenced.

Also do you know of any groups, discord, etc, for discussing this subject?

I know plenty of discords, but none where developers/people-that-can-write-code make up the majority of people. You can probably find better discords using Google haha. I don't really go looking for discords unless they have interesting people running them.

Coice commented 2 years ago

Fair enough, thank you for your time!