NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.33k stars 1.39k forks source link

Loop through all available engines for cuDNN heuristics search #1740

Closed minitu closed 11 months ago

minitu commented 11 months ago

This PR updates the cuDNN heuristics search to loop through all available engines and find the first successful one for a given operation graph, instead of trying a set number of times.

eqy commented 11 months ago

Also noting that the plan_cache should be made thread_local, but this doesn't have to be done here and isn't necessary if we assume no multithreaded interactions (expected if we are only calling this layer from python where there is a single thread per device).