Closed vishal16babu closed 3 years ago
Yes, it is possible. You at least need to retrain one of the models to make them compatible.
Thanks @auspicious3000 , I will give it a try
Hi @auspicious3000 ,
I looked at the spectrogram calculation code and it does not look like a straightforward mel spectrogram calculation. I also tried using librosa.feature.inverse.mel_to_audio(spec, sr=16000, n_fft=1024)
to get audio using Griffin-Lim instead of WaveNet and it resulted in a garbage signal as expected.
P.S: I am not very familiar with common preprocessing techniques to calculate spectrograms. So any references which can help me understand the motivation behind spectrogram calculation code are very much appreciated
Is it possible to use PWG vocoder(https://github.com/kan-bayashi/ParallelWaveGAN) instead of Wavenet on the output of the decoder? Specifically, do I need to change the frame length and frame hop to make the mel spectrograms compatible with PWG.
Wavenet inference is very slow so it would help we are able to use any other neural vocoders directly. That way we could just finetune the given pretrained speechsplit models instead of training again from scratch.