Closed BartekRoszak closed 5 years ago
You can use the pretrained LJSpeech waveglow for any language. It will even work for a male voice.
Here is a synthesized example for a Mongolian male voice using the LJSpeech trained waveglow:
Unfortunately, I have a female voice.
Then, it will even work better. LJSpeech is a female voice.
It is not sound well.
You can hear in the background real voice but the noise awful.
I feed it with mel_spectrogram created directly from wav file by get_mel
method in TextMelLoader
.
I use default hparams.
I send original wav and preprocessed mel-spectrogram after pretrained WaveGlow.
Your wav file is 32(float). You have to change normalization code.
generated_pretrained_LJS.wav.zip is generated by pretrained LJS waveglow. It sounds pretty well.
@delgerdalai Thanks! Sounds much better. It won't be a problem that each audio file will be normalized using different value? Or I should put there some constant value which will fit the whole dataset?
I think goal is convert audio file to [-1, 1] range.
I don't know about wav 32bit float file format. Maybe you can find maximum value of whole dataset and then you can use it by constant.
Or just audio_norm = audio / audio.max() might be better.
for 16bit int wav file:
audio_norm = audio / 16bit integer maximum value. _LJS dataset wav is 16bit integer. Therefore hparams.max_wavvalue is 32767.
Thank you. It looks like files are already normalized and values are in the range <-1,1>. No need to do normalization in my case :)
I am training a Tacotron model with a custom dataset. In inference.py script I can check how well the model is a the moment but I have to have WaveGlow model to create waveform. I do not have computation power to train two models in parallel (Tacotron & WaveGlow). So now I cannot check how well Tacotron is doing because I cannot create waveform. Is there any option to create waveform directly from Tacotron without WaveGlow?