Hifi GAN with 24kHz - Githubissues

NTT123 / vietTTS

Vietnamese Text to Speech library

MIT License

196 stars 91 forks source link

Hifi GAN with 24kHz #20

Closed ngoquanghuy99 closed 2 years ago

ngoquanghuy99 commented 2 years ago

Hi @NTT123 Thanks for your work! What is the config of Hifi GAN with 24kHz? (The default is 16kHz)? Or at least, could you share your tips to calculate the parameters.

NTT123 commented 2 years ago

Hi @ngoquanghuy99!

To work with 24k audio, you need to modify 3 files:

vietTTS/nat/config.py,
assets/hifigan/config.json, and
vietTTS/synthesizer.py

by setting:

sample_rate = 24000
fmax = 12000

ngoquanghuy99 commented 2 years ago

I already knew this @NTT123 I mean the segment_size and hop_size. Anyways, I've found it! The first one should be a multiple of the second one. Thank you!

nampdn commented 2 years ago

@ngoquanghuy99 can you share your config? I'm trying to train with 48k sample rate audio

ngoquanghuy99 commented 2 years ago

@nampdn You just have to change 2 parameters: sample_rate to what you want (48k) and fmax to 24k.

nampdn commented 2 years ago

Oh really, thank you both of you. But will the sample_rate affect the training duration?

ngoquanghuy99 commented 2 years ago

I could say no. They have the same training duration!

nampdn commented 2 years ago

Hi @ngoquanghuy99, I can noticed that the training time increase almost longer to reach a state that can be inference for >44,1kHZ