Question: why cudnn gru performs better than vanilla tensorflow gru?

NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP

https://nvidia.github.io/OpenSeq2Seq

Apache License 2.0

1.54k stars 371 forks source link

Question: why cudnn gru performs better than vanilla tensorflow gru? #450

Closed wanglouis49 closed 5 years ago

wanglouis49 commented 5 years ago

I have run Deep Speech 2 models, both small and large, with or without cudnn. I notice the model using cudnn converges much faster even though other settings are exactly the same. To clarify, I mean eval_loss/wer vs step instead of training time. Any expert on cudnn could answer why is this happening?

borisgin commented 5 years ago

Was training in float or in mixed precision?

wanglouis49 commented 5 years ago

It's in float32. No mixed precision. Below is an example from ds2_small_1gpu.py, blue curve is when using cudnn_gru, and red is tensorflow gpu.

borisgin commented 5 years ago

Cudnn GRU is slightly different from TF GRU: see https://www.tensorflow.org/api_docs/python/tf/contrib/cudnn_rnn/CudnnCompatibleGRUCell

wanglouis49 commented 5 years ago

It is interesting to know that it causes a considerably big difference in this case. Thanks.