Closed arashdehghani closed 5 years ago
You global batch is 4x smaller than in the example. Have you tried to reduce learning rate to "learning_rate": 0.0001? I would also change lr_policy to cosine
Unfortunately, it does not have any benefits through setting learning to 0.0001 There does not seem to be any learning rate policy known as cosine!! In addition to reducing the learning rate, how can i compensate the lack of global batch numbers and the number of GPUs? Grateful
I mean standard cosine decay policy https://www.tensorflow.org/api_docs/python/tf/train/cosine_decay
Thanks for link, I'll implement this method in lr_policies.py and test it.
Hi every body,
I've trained medium version of deep speech model for Persian language on a system with 2 GPU (GeForce GTX 1080) Memory space = 8 GB GDDR5X and using ds2_medium_4gpus.py config file with "batch_size_per_gpu": 16. After about 70 epochs, I couldn't reach to a better performance than 26% (WER) and train loss ~ 42. Also, I've had same issue for English language dataset like LibriSpeech. What is your idea about this problem? Is it about the number of GPUs or batch_size_per_gpu? If i change the policy of decaying learning rate or optimization method, can i reach better accuracy on this system?
Thanks, Arash