Open fatihkiralioglu opened 4 years ago
Someone please answer this question.I trained the model after loading the pretrained weights ,but after 14K steps the audio is full of noise.
I got the same issue.
waveglow_256channels_universal_v5.pt
as the pretrained modelv5 model
should be trained by mel spec with :
"sampling_rate": 22050,
"filter_length": 1024,
"hop_length": 256,
"win_length": 1024,
"mel_fmin": 0.0,
"mel_fmax": 8000.0
my mel spec
was:
"sampling_rate": 16000,
"filter_length": 768,
"hop_length": 192,
"win_length": 768,
"mel_fmin": 0.0,
"mel_fmax": 8000.0
Before training, I used the v5 model(22k pretrained)
to infer my mel spec
, the speech was still audible(even male's spec), of course the pitch must all shifted down if I choosed to output frame-rate as 16kHz.
After training with the pre-trained model, the loss could fast drop to ~-5.0
after few steps, in the period of my 25k steps, the losses were ~-5.5
around, but all the audio which inferenced by 25k steps checkpoint
were all full of noise(almost no sound).
Of course if I trained without pre-trained model, the loss will drop very slowly, and the inference results were also full of noise.
Maybe we could try to modify the code as #88 , then try again.
So after training the pre-trained model for 25k steps,you are still getting noisy output?
I also faced the same issue ,the output I got after inference with waveglow_256channels_universal_v5.pt
was at least audible.
I also got the same loss around -6.
after #88 , training 16kHz with pre-trained model is not available anymore, because the WaveGlow.upsample
depends on the win_length
/hop_length
.
Yes,I also faced the same issue.So I trained the model from scratch.After 100K steps ,the audio quality is not improving much . The generated audio has audible speech ,but has some noise.Do you know how much steps is required for getting results similar to official model?
Have you tried #99?Can we train 16KHz with pre-trained model using this code?
Hi, I currently have a problem with 16kHz waveglow training
My Tacontron2 model is ok (tested with pre-trained WaveGlow model). I'm trying to train waveglow from scratch.
I used WaveGlow code at master branch with below config.json
"train_config":
"fp16_run": true,
"output_directory": "checkpoints",
"epochs": 100000,
"learning_rate": 1e-4,
"sigma": 1.0,
"iters_per_checkpoint": 2000,
"batch_size": 12,
"seed": 1234,
"checkpoint_path": "",
"with_tensorboard": false
"data_config":
"training_files": "train_files.txt",
"segment_length": 16000,
"sampling_rate": 16000,
"filter_length": 800,
"hop_length": 200,
"win_length": 800,
"mel_fmin": 0.0,
"mel_fmax": 8000.0
"waveglow_config":
"n_mel_channels": 80,
"n_flows": 12,
"n_group": 8,
"n_early_every": 4,
"n_early_size": 2,
"WN_config": {
"n_layers": 8,
"n_channels": 256,
"kernel_size": 3
}
I have trained for 236k steps and every output audios are silence. Hope u guys could give me some light :( Output audio: https://drive.google.com/drive/folders/1hqVHOVoZISP3-BxvJG8n3MCfG6LGF0te?usp=sharing
Did anyone manage to solve this issue? I'm also training on 16000 dataset. To check the model I trained it just on 12 samples (1 batch) with different parameters using pretrained model. The first one:
"segment_length": 16000,
"sampling_rate": 16000,
"filter_length": 800,
"hop_length": 200,
"win_length": 800,
"learning_rate": 1e-5
after 500 epochs the loss starts to increase, all the inferences (500, 1000, ... 5000) give only noise in the output. The second one:
"segment_length": 16000,
"sampling_rate": 16000,
"filter_length": 1024,
"hop_length": 256,
"win_length": 1024,
"learning_rate": 1e-5
Gives audible speech after 500, but there's a lot of noise and it's too fast.
The question is: why does the loss increase? Why does the quality remain the same on the training set and does not improve even though the sample has been seen many times? And how to remove the noise and normalize the audio speed?
Was anyone able to figure this out? I also tried training 16k from scratch and had the same experience as @mychiux413
You can find a model trained from scratch on 21 hours of multispeaker 16kHz data (544000 training steps) here: http://adrianastan.com/models/ . Not as good as the NVIDIA release, but it does the job.
The config is as follows:
{
"train_config": {
"fp16_run": true,
"output_directory": "checkpoints_swara",
"epochs": 100000,
"learning_rate": 1e-4,
"sigma": 1.0,
"iters_per_checkpoint": 2000,
"batch_size": 8,
"seed": 1234,
"checkpoint_path": "",
"with_tensorboard": false
},
"data_config": {
"training_files": "train_SWARA.txt",
"segment_length": 16000,
"sampling_rate": 16000,
"filter_length": 1024,
"hop_length": 256,
"win_length": 1024,
"mel_fmin": 0.0,
"mel_fmax": 8000.0
},
"dist_config": {
"dist_backend": "nccl",
"dist_url": "tcp://localhost:54321"
},
"waveglow_config": {
"n_mel_channels": 80,
"n_flows": 12,
"n_group": 8,
"n_early_every": 4,
"n_early_size": 2,
"WN_config": {
"n_layers": 8,
"n_channels": 256,
"kernel_size": 3
}
}
}
Perhaps you can warmstart your model from it.
Trained one for 377.5k steps, unsure of how good/bad it is because for my use case it was okay-ish - https://drive.google.com/file/d/1dP4eMDPrZyqRo_gMz1VUDr2Bd_eRXoIa/view?usp=sharing
Trained one for 377.5k steps, unsure of how good/bad it is because for my use case it was okay-ish - https://drive.google.com/file/d/1dP4eMDPrZyqRo_gMz1VUDr2Bd_eRXoIa/view?usp=sharing
Can you also share your config please.
Trained one for 377.5k steps, unsure of how good/bad it is because for my use case it was okay-ish - https://drive.google.com/file/d/1dP4eMDPrZyqRo_gMz1VUDr2Bd_eRXoIa/view?usp=sharing
I get the following exception when loading the model:
No module named 'waveglow'
Hi, I'm trying to train 16kHz models for both waveglow and tacotron2. for 16k tacotron I have used win_length=800 and hop_length=200, It has produced good results with 22k pretrained waveglow model. In order to get better results I want to train an 16khz waveglow model I guess that the same parameter values of 800 and 200 should be used for waveglow training. When I use these new parameters instead of 1024 and 256, can I still use pretrained 22k waveglow model for warmstart? I have some reservations because pretrained 22k waveglow model is trained with win_length:1024 and hop_length:200 Thanks.