DanRuta / xVA-Synth

Machine learning based speech synthesis Electron app, with voices from specific characters from video games
GNU General Public License v3.0
590 stars 54 forks source link

HiFi-GAN training #21

Closed longjoke closed 2 years ago

longjoke commented 3 years ago

Are the HiFi-GAN models trained using mel-spectrograms generated from the ground truth audio, from the Tacotron2 models or from the Fastpitch models?

DanRuta commented 3 years ago

currently ground truth

longjoke commented 3 years ago

Do you think it could be advantageous to use Tacotron2 output? Since Fastpitch is trained on that. It should be fairly easy to do using the extract-mels.py from Fastpitch with the --extract-mels-teacher argument.

DanRuta commented 3 years ago

Potentially, or even the FastPitch ones perhaps. I plan to experiment with this soon, once FastPitch gets its next update