NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.42k stars 1.4k forks source link

Loop through all available engines for cuDNN heuristics search #1740

Closed minitu closed 1 year ago

minitu commented 1 year ago

This PR updates the cuDNN heuristics search to loop through all available engines and find the first successful one for a given operation graph, instead of trying a set number of times.

eqy commented 1 year ago

Also noting that the plan_cache should be made thread_local, but this doesn't have to be done here and isn't necessary if we assume no multithreaded interactions (expected if we are only calling this layer from python where there is a single thread per device).