Async benchmarks always deadlock

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Apache License 2.0

474 stars 63 forks source link

Async benchmarks always deadlock #136

Closed gevtushenko closed 5 months ago

gevtushenko commented 1 year ago

The recent switch to lazy loading by default in CTK 12.2 seems to have broken the async benchmarks. This can be reproduced by nvbench.example.axes. The deadlock can be fixed by CUDA_MODULE_LOADING=EAGER. We should incorporate this information into the error message or set the variable ourselves.

alliepiper commented 1 year ago

We likely want eager loads by default anyway to make sure that lazy loads aren't affecting measurements. Let's look into defining that var from the NVBench main implementation.