Closed minitu closed 1 year ago
Also noting that the plan_cache
should be made thread_local
, but this doesn't have to be done here and isn't necessary if we assume no multithreaded interactions (expected if we are only calling this layer from python where there is a single thread per device).
This PR updates the cuDNN heuristics search to loop through all available engines and find the first successful one for a given operation graph, instead of trying a set number of times.