Closed EuphoriaCelestial closed 4 years ago
Hello, Unfortunately, I could not get a decent, noise-free voice with 8kHz data. I guess that winlength shoudl be: 400 and hop length: 100 while filter length of 1024 should be constant. But I have trained with winlength 1024, therefore, the results were too noisy. @EuphoriaCelestial, could you share your trained waveglow model if possible, so that we can check it for other languages and speaker databases, or use it as a warmsstart for trained speaker specific vocoders?
My 8kHz waveglow has not been trained enough, I decided to use 16kHz instead. But if you want, here it is, I found it in trash, luckily not permanently deleted https://drive.google.com/file/d/1nf-d0AVfkmULzx4R8Mz3n7TWNPSK7z7m/view?usp=sharing
p/s: my drive is almost full, so I will delete this file after 1 week
@EuphoriaCelestial Thank you, are waveglow training parameters win_length=400 hop_length=100 or default 1024 and 256? I can warmstart with my data accordingly.
@EuphoriaCelestial Thank you, are waveglow training parameters win_length=400 hop_length=100 or default 1024 and 256? I can warmstart with my data accordingly.
I used your config, remember?
@EuphoriaCelestial ok, then your model parameters are:
filter_length=1024,
hop_length=256,
win_length=1024,
Currently we try to train a 8khz waveglow model for hop_length=100, and win_length=400 I hope that this may solve the noise problem. It seems that I can not warmstart your checkpoint due to config difference. Thanks.
@fatihkiralioglu oh, please let me know if it work. Even with 16kHz, there is still noise
@fatihkiralioglu pls let us know if this configuration works, Seems like this is the R&D part. It would be great if you can achieve good results using sampling rate of 8000.
Thank you.
@EuphoriaCelestial can you please share waveglow checkpoint for sr=16000
@EuphoriaCelestial can you please share waveglow checkpoint for sr=16000
https://drive.google.com/file/d/1nf-d0AVfkmULzx4R8Mz3n7TWNPSK7z7m/view?usp=sharing
@EuphoriaCelestial isnt this one for 8k sample rate you shared same link at your previous comment
@rafaelvalle @EuphoriaCelestial i trained my tacotron2 model on custom dataset of 16khz after 8000 iterations i tested the model using universal waveglow model but results were not good the speech is different than the text. then i used @EuphoriaCelestial waveglow model with my trained tacotron2 model but still it gave worst results. can you please help me understanding the problem.i used all the params which are by default in tacotron2 just changed the sr to 16000. i am sharing results samples. http://festvox.org/cmu_arctic/dbs_ksp.html dataset which i used sample 16khz.zip PS. i am newbie please give explanatory answer
@EuphoriaCelestial isnt this one for 8k sample rate you shared same link at your previous comment
oh sorry, my bad, I thought it was 16kHz; but for now, I cant upload my 16kHz checkpoint because my drive run out of space
@rafaelvalle @EuphoriaCelestial i trained my tacotron2 model on custom dataset of 16khz after 8000 iterations i tested the model using universal waveglow model but results were not good the speech is different than the text. then i used @EuphoriaCelestial waveglow model with my trained tacotron2 model but still it gave worst results. can you please help me understanding the problem.i used all the params which are by default in tacotron2 just changed the sr to 16000. i am sharing results samples.
since I gave you wrong checkpoint, the result is very bad, but I dont think you can use my 16kHz checkpoint either, because of different parameters used during training. I suggest you train your own waveglow model, its very easy, just make sure you use the same audio config (in config.json file) as tacotron model (in hparams.py file). They said it will give better audio result if you trained tacotron and waveglow model using the same voice
thank you @EuphoriaCelestial for your reply can you please suggest which parameters should be same . meanwhile can you please look into this and see if you can help here https://github.com/NVIDIA/waveglow/issues/218#issue-667681860
@EuphoriaCelestial, synthesis is quite good, but I guess that you did not train speaker-specific waveglow model here. If you use a custom trained waveglow model, synthesis quality will be higher.
I use the same data to train both waveglow and tacotron, thus I think it is speaker-specific, right?
you guys seems to be good in this repo, need help with python train.py --output_directory=outdir --log_directory=logdir
I want to train my own dataset, i recorded 19hrs of data, i need to know what do i need to do in order to use my own dataset? do i need to create a folder or edit any file to use my dataset?
Can you help me with https://github.com/NVIDIA/tacotron2/issues/463?
@EuphoriaCelestial If I want the final model should synthesize in audios with 8k sampling rate. if I change sr 22050 to 8000 what other parameters need to change for both tacotron2 and wave glow. could you please share hparams.py for tacotron2 and config for waveglow.
you can find the all configurations I used above, just scroll up a bit
and I've 8 hours of data only can I retrain tacotron2 and wave glow with new sr? or do I need to train from scratch.
I dont understand this part. You will need to train from scratch both tacotron and waveglow model for new sample rate
hello, i want to train the model with sampling rate = 16k , but i dont know how to change the other parameters? can you help me? thank you !
Could you successfully train the model on fp16?
https://github.com/NVIDIA/waveglow/issues/54 In this issue, they were talking about lower some parameters to maximize inference speed. But I dont know how to do it properly, what can be reduced and what need to remain. Anyone did this before? Please send me your hparams configuration.
if I trained my model using fp32, can it run inference in fp16 and vice versa? in this case, will it impove inference speed? I am using RTX 2080ti, my model run 7 times faster than real-time, and I am pretty sure it can be improved
and one more thing, is there any benefit of running inference using multi-GPUs?