NVIDIA / tacotron2

Tacotron 2 - PyTorch implementation with faster-than-realtime inference
BSD 3-Clause "New" or "Revised" License
5.12k stars 1.39k forks source link

Optimize model for inference speed #348

Closed EuphoriaCelestial closed 4 years ago

EuphoriaCelestial commented 4 years ago

https://github.com/NVIDIA/waveglow/issues/54 In this issue, they were talking about lower some parameters to maximize inference speed. But I dont know how to do it properly, what can be reduced and what need to remain. Anyone did this before? Please send me your hparams configuration.

if I trained my model using fp32, can it run inference in fp16 and vice versa? in this case, will it impove inference speed? I am using RTX 2080ti, my model run 7 times faster than real-time, and I am pretty sure it can be improved

and one more thing, is there any benefit of running inference using multi-GPUs?

EuphoriaCelestial commented 4 years ago

Hello, Unfortunately, I could not get a decent, noise-free voice with 8kHz data. I guess that winlength shoudl be: 400 and hop length: 100 while filter length of 1024 should be constant. But I have trained with winlength 1024, therefore, the results were too noisy. @EuphoriaCelestial, could you share your trained waveglow model if possible, so that we can check it for other languages and speaker databases, or use it as a warmsstart for trained speaker specific vocoders?

My 8kHz waveglow has not been trained enough, I decided to use 16kHz instead. But if you want, here it is, I found it in trash, luckily not permanently deleted https://drive.google.com/file/d/1nf-d0AVfkmULzx4R8Mz3n7TWNPSK7z7m/view?usp=sharing

p/s: my drive is almost full, so I will delete this file after 1 week

fatihkiralioglu commented 4 years ago

@EuphoriaCelestial Thank you, are waveglow training parameters win_length=400 hop_length=100 or default 1024 and 256? I can warmstart with my data accordingly.

EuphoriaCelestial commented 4 years ago

@EuphoriaCelestial Thank you, are waveglow training parameters win_length=400 hop_length=100 or default 1024 and 256? I can warmstart with my data accordingly.

I used your config, remember?

fatihkiralioglu commented 4 years ago

@EuphoriaCelestial ok, then your model parameters are:
filter_length=1024, hop_length=256, win_length=1024,

Currently we try to train a 8khz waveglow model for hop_length=100, and win_length=400 I hope that this may solve the noise problem. It seems that I can not warmstart your checkpoint due to config difference. Thanks.

EuphoriaCelestial commented 4 years ago

@fatihkiralioglu oh, please let me know if it work. Even with 16kHz, there is still noise

aishweta commented 4 years ago

@fatihkiralioglu pls let us know if this configuration works, Seems like this is the R&D part. It would be great if you can achieve good results using sampling rate of 8000.

Thank you.

Ahmad-noborders commented 4 years ago

@EuphoriaCelestial can you please share waveglow checkpoint for sr=16000

EuphoriaCelestial commented 4 years ago

@EuphoriaCelestial can you please share waveglow checkpoint for sr=16000

https://drive.google.com/file/d/1nf-d0AVfkmULzx4R8Mz3n7TWNPSK7z7m/view?usp=sharing

Ahmad-noborders commented 4 years ago

@EuphoriaCelestial isnt this one for 8k sample rate you shared same link at your previous comment

Ahmad-noborders commented 4 years ago

@rafaelvalle @EuphoriaCelestial i trained my tacotron2 model on custom dataset of 16khz after 8000 iterations i tested the model using universal waveglow model but results were not good the speech is different than the text. then i used @EuphoriaCelestial waveglow model with my trained tacotron2 model but still it gave worst results. can you please help me understanding the problem.i used all the params which are by default in tacotron2 just changed the sr to 16000. i am sharing results samples. http://festvox.org/cmu_arctic/dbs_ksp.html dataset which i used sample 16khz.zip PS. i am newbie please give explanatory answer

EuphoriaCelestial commented 4 years ago

@EuphoriaCelestial isnt this one for 8k sample rate you shared same link at your previous comment

oh sorry, my bad, I thought it was 16kHz; but for now, I cant upload my 16kHz checkpoint because my drive run out of space

@rafaelvalle @EuphoriaCelestial i trained my tacotron2 model on custom dataset of 16khz after 8000 iterations i tested the model using universal waveglow model but results were not good the speech is different than the text. then i used @EuphoriaCelestial waveglow model with my trained tacotron2 model but still it gave worst results. can you please help me understanding the problem.i used all the params which are by default in tacotron2 just changed the sr to 16000. i am sharing results samples.

since I gave you wrong checkpoint, the result is very bad, but I dont think you can use my 16kHz checkpoint either, because of different parameters used during training. I suggest you train your own waveglow model, its very easy, just make sure you use the same audio config (in config.json file) as tacotron model (in hparams.py file). They said it will give better audio result if you trained tacotron and waveglow model using the same voice

Ahmad-noborders commented 4 years ago

thank you @EuphoriaCelestial for your reply can you please suggest which parameters should be same . meanwhile can you please look into this and see if you can help here https://github.com/NVIDIA/waveglow/issues/218#issue-667681860

Syed044 commented 3 years ago

@EuphoriaCelestial, synthesis is quite good, but I guess that you did not train speaker-specific waveglow model here. If you use a custom trained waveglow model, synthesis quality will be higher.

I use the same data to train both waveglow and tacotron, thus I think it is speaker-specific, right?

you guys seems to be good in this repo, need help with python train.py --output_directory=outdir --log_directory=logdir

I want to train my own dataset, i recorded 19hrs of data, i need to know what do i need to do in order to use my own dataset? do i need to create a folder or edit any file to use my dataset?

  1. after training i assume its creating checkpoint file in outdir folder. how do i use it to see the results? please help
ErfolgreichCharismatisch commented 3 years ago

Can you help me with https://github.com/NVIDIA/tacotron2/issues/463?

evelynyhc commented 3 years ago

@EuphoriaCelestial If I want the final model should synthesize in audios with 8k sampling rate. if I change sr 22050 to 8000 what other parameters need to change for both tacotron2 and wave glow. could you please share hparams.py for tacotron2 and config for waveglow.

you can find the all configurations I used above, just scroll up a bit

and I've 8 hours of data only can I retrain tacotron2 and wave glow with new sr? or do I need to train from scratch.

I dont understand this part. You will need to train from scratch both tacotron and waveglow model for new sample rate

hello, i want to train the model with sampling rate = 16k , but i dont know how to change the other parameters? can you help me? thank you !

farzanehnakhaee70 commented 2 years ago

Could you successfully train the model on fp16?