No Reduction in Loss Value in Training with 2 GPU GeForce GTX 1080!

arashdehghani commented 5 years ago

Hi every body,

I've trained medium version of deep speech model for Persian language on a system with 2 GPU (GeForce GTX 1080) Memory space = 8 GB GDDR5X and using ds2_medium_4gpus.py config file with "batch_size_per_gpu": 16. After about 70 epochs, I couldn't reach to a better performance than 26% (WER) and train loss ~ 42. Also, I've had same issue for English language dataset like LibriSpeech. What is your idea about this problem? Is it about the number of GPUs or batch_size_per_gpu? If i change the policy of decaying learning rate or optimization method, can i reach better accuracy on this system?

Thanks, Arash

borisgin commented 5 years ago

You global batch is 4x smaller than in the example. Have you tried to reduce learning rate to "learning_rate": 0.0001? I would also change lr_policy to cosine

arashdehghani commented 5 years ago

Unfortunately, it does not have any benefits through setting learning to 0.0001 There does not seem to be any learning rate policy known as cosine!! In addition to reducing the learning rate, how can i compensate the lack of global batch numbers and the number of GPUs? Grateful

borisgin commented 5 years ago

I mean standard cosine decay policy https://www.tensorflow.org/api_docs/python/tf/train/cosine_decay

arashdehghani commented 5 years ago

Thanks for link, I'll implement this method in lr_policies.py and test it.

NVIDIA / OpenSeq2Seq

No Reduction in Loss Value in Training with 2 GPU GeForce GTX 1080! #430