Open ScottyBauer opened 3 years ago
It may be constrained by the disk read. Move your dataset to a faster storage, like copying it to RAM in /dev/shm
.
Tesla V100 is an old GPU. For the default model size, you're going to top out around 2-3 step/sec at r=7 and 1 step/sec at r=2. It will be faster if you discard your longer utterances.
You may wish to check out CorentinJ/Real-Time-Voice-Cloning. It uses the same Tacotron and WaveRNN models as this repo. Once you get the hang of Tacotron (synthesizer) training, check out https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/437 as it describes what you are trying to do.
I'm fooling around with this project and I'm getting throughput I think is too slow, which leads me to believe I may have mis-configured something or there are other issues.
I'm reusing the pre-trained models with my own custom audio of ~750 audio clips ranging from 4-10 seconds.
I'm using: PyTorch 1.7.1 with Python3.7 (CUDA 11.0 and Intel MKL)
In order to get the code to run properly I had to apply the fix from this bug (not sure if this is relevant just want to give all details):
201
and I applied this pull request: https://github.com/fatchord/WaveRNN/pull/213/commits/521179e25dd309772a7896cc67757650b0c061b7
The only changes I've made to hyperparams is changing peak_norm from false to true:
and setting my paths.
I can confirm that it is using the GPU (at least GPU memory), but I've never seen nvidia-smi show utilization above 38%:
Things I've tried: upping the batch size in hyperparams, also the learning rate, up to 64, which didn't help.
here is nvidia-smi output:
what it's up to:
If I change some of the learning rate parameters:
and smi:
Let me know what other information I can provide to help debug this.
Thank you, Scott