Expected result: the cudaGetDevice() call should return device 2, not device 0.
The problem appears to be because cudaSetDevice only calls ccudart.utils.lazyInitGlobal, whereas cudaGetDevice calls ccudart.utils.lazyInit (which calls lazyInitDevice(0)).
I think that cudaGetDevice just needs to not call lazyInit (the case of no context being in place is handled by the branch that calls cudaSetDevice(0))
Using an environment with:
=>
Expected result: the
cudaGetDevice()
call should return device 2, not device 0.The problem appears to be because
cudaSetDevice
only callsccudart.utils.lazyInitGlobal
, whereascudaGetDevice
callsccudart.utils.lazyInit
(which callslazyInitDevice(0)
).I think that
cudaGetDevice
just needs to not calllazyInit
(the case of no context being in place is handled by the branch that callscudaSetDevice(0)
)https://github.com/NVIDIA/cuda-python/blob/main/cuda/_lib/ccudart/ccudart.pyx#L1039-L1045
Plausibly a patch like this?
Note this has two other fixes:
err_driver != CUDA_SUCCESS
actually return the error codecudaErrorDeviceUninitialized
(not sure if this is the correct error code)