Closed wanglouis49 closed 5 years ago
Was training in float or in mixed precision?
It's in float32. No mixed precision. Below is an example from ds2_small_1gpu.py
, blue curve is when using cudnn_gru, and red is tensorflow gpu.
Cudnn GRU is slightly different from TF GRU: see https://www.tensorflow.org/api_docs/python/tf/contrib/cudnn_rnn/CudnnCompatibleGRUCell
It is interesting to know that it causes a considerably big difference in this case. Thanks.
I have run Deep Speech 2 models, both small and large, with or without cudnn. I notice the model using cudnn converges much faster even though other settings are exactly the same. To clarify, I mean eval_loss/wer vs step instead of training time. Any expert on cudnn could answer why is this happening?