Open maleadt opened 1 year ago
Looks like the switch to CDP2
changed a lot in the device runtime (headers at cuda/targets/x86_64-linux/include/cuda_device_runtime_api.h
). I haven't fully figured it out yet, but it looks like calls to e.g. cudaMalloc
are now using new __cudaCDP2Malloc
functions too.
For https://github.com/JuliaGPU/CUDA.jl/issues/1846, we need to support and use the new dynamic parallelism API (CDP2). While the legacy API (CDP1) still works, there's a couple of things that don't on
sm_90
+:cudaDeviceSynchronize
LIMIT_DEV_RUNTIME_SYNC_DEPTH
context limitThis new API was introduced in CUDA 12.