CUDNN optimal algorithm search

NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP

https://nvidia.github.io/OpenSeq2Seq

Apache License 2.0

1.54k stars 369 forks source link

CUDNN optimal algorithm search #344

Closed fminkin closed 5 years ago

fminkin commented 5 years ago

Hello, I noticed that speech to text models (I tried out jasper) have very long "preheating stage".

E.G. first several thousands batches is really slow, and then the network fixes the algorithm and becomes steadily fast.

enviromental variable TF_CUDNN_USE_AUTOTUNE=0 fixes the problem, but TF_CUDNN_USE_AUTOTUNE=1 network after several hours of preheating overtakes network with autotune and becomes faster.

Do you have some insight on how to fix that behaviour or how to appropriately preheat the model?

borisgin commented 5 years ago

Do you see this behavior on local machine, or this is on cluster? Btw, currently there is no way to set cudnn_algorithm in TF manually.

fminkin commented 5 years ago

I'm using cluster with 8 v100.

borisgin commented 5 years ago

I guess, that if dataset is stored on remote storage, then it takes to cache it on local machine during 1st epoch

fminkin commented 5 years ago

Dataset is stored locally, and I'm also reading features instead of wav files.

It's not only the training, inference is slow too if you don't disable autotuning. You can try it out by freezing one of your models and get timings for 1000 batches.