Open Giodiro opened 2 years ago
Hello @Giodiro ,
I have done a quick fix with the use of the __CUDACC__
preprocessor macro. So the python code does write the two codes (for gpu and for cpu) and the preprocessor will choose depending on this macro. This is not very clean but I am feeling that it is better than using the config.use_cuda
variable. I think the clean fix would be to add a device
attribute to the c_variable
and c_array
python classes to specify the device (host or device), as we do for the dtype
. But this would be a little more work, I will do it later maybe.
Hi again,
I've encountered another small issue, I'm not sure whether it's worth fixing or not.
While running keops with the cpu backend on a machine where cuda is available, the
rsqrt
operation fails. Some code to reproduce the error follows:I think I tracked it down to the codegen not having access to the chosen backend ('cpu', 'gpu_1d', etc), hence generating code for the CPU or GPU based on the global
config.use_cuda
variable. This is problematic forrsqrt
since it uses a function only available in nvcc.As a workaround I can just modify the global variable as well as setting the backend.