NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 369 forks source link

Question: Wave2Letter+ Configuration for different number of GPUs #322

Closed the01 closed 5 years ago

the01 commented 5 years ago

I was attempting to reproduce the results of the Wave2Letter+ model from the docu using this config as basis. I only have 2 Titan X available, so I couldn't use mixed-precision and changed the number of GPUs to 2. I was only able to achieve a WER of around 16%, far from the reported 6.67%.

I assume I need a different learning rate/.. for training with 1, 2, 4, .. GPUs? What configuration should I use to get similar result?

borisgin commented 5 years ago

Please reduce LR to 0.01 and turn off LARC. What was your training ""num_epochs"?

the01 commented 5 years ago

I assume removing "larc_params" from base_params turns it off? I believe it was 200 or 250. Do I need a higher value? For how many GPUs is this setting intended?

borisgin commented 5 years ago

This learning rate is for 2 GPUs. Yes, you can remove larc parameters.

the01 commented 5 years ago

Ok, I changed LR, removed larc_params and turned horovod off, but it now throws W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found. errors. Do I need to compile tensorflow myself and use the ctc with lm? I noticed that the Wave2LetterEncoder was changed to TDNNEncoder.

vaibhav0195 commented 5 years ago

hi @borisgin i am also getting this warning tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found but i am positive that i compiled the tensorflow with the kenlm configuration as specified in the docs.

does this affect training or this is the warning just during when the output is decoded for the transcription ?

Thanks

borisgin commented 5 years ago

Tensorflow is ok, this means that there training diverged, or that maybe audio sequence is too short

vaibhav0195 commented 5 years ago

thanks for the reply @borisgin so does that means my model is not being trained ? and also I have a huge dataset of audio files and my system have 64Gb of ram and two nvidia 1080TI using the OS ubuntu 16.04. But i have noticed that when i try to run the training on multigpu i.e( by using the numgpus in the config=2) the training is only stuck at the initializing state and the model is loaded on both of my gpus but the usage of both the gpu sits at 0%.

thanks.