GANtastic3 / MaskCycleGAN-VC

Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.
MIT License
112 stars 31 forks source link

Input feature #18

Closed EmreOzkose closed 2 years ago

EmreOzkose commented 3 years ago

Hi!

Thanks for sharing code.

Why did you use vocoder() funtion to extract features? These features are mel-spectrograms, right? Why didn't use just librosa? I tried and differences between extracted featurs are not negligible. Ranges are so different.

hikaruhotta commented 3 years ago

A vocoder was used to extract features so that we could listen to the converted audio (by passing the converted mel-spectrogram through the vocoder in the reverse direction).

EmreOzkose commented 3 years ago

Just for convenience between converted mel and vocoder, right? Thank you. Do you think if changing pre-processing of mel is a significant change for generating good sounds?

In my case, the vocoder is insufficient to produce good sounds. I have a mb-melgan which is trained on my data and works well. I will do experiments with changed mel, and report here.

MorenoLaQuatra commented 2 years ago

I've the same doubt, @EmreOzkose did you manage to arrange a different mel conversion?

EmreOzkose commented 2 years ago

I conducted experiments with changed mels. I trained mb-melgan with changed mels. Voice conversion was not good enough compared to default mels.

MorenoLaQuatra commented 2 years ago

Great to know, thanks!