NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 371 forks source link

Question: why cudnn gru performs better than vanilla tensorflow gru? #450

Closed wanglouis49 closed 5 years ago

wanglouis49 commented 5 years ago

I have run Deep Speech 2 models, both small and large, with or without cudnn. I notice the model using cudnn converges much faster even though other settings are exactly the same. To clarify, I mean eval_loss/wer vs step instead of training time. Any expert on cudnn could answer why is this happening?

borisgin commented 5 years ago

Was training in float or in mixed precision?

wanglouis49 commented 5 years ago

It's in float32. No mixed precision. Below is an example from ds2_small_1gpu.py, blue curve is when using cudnn_gru, and red is tensorflow gpu.

Screen Shot 2019-05-29 at 3 01 53 pm
borisgin commented 5 years ago

Cudnn GRU is slightly different from TF GRU: see https://www.tensorflow.org/api_docs/python/tf/contrib/cudnn_rnn/CudnnCompatibleGRUCell

wanglouis49 commented 5 years ago

It is interesting to know that it causes a considerably big difference in this case. Thanks.