getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.03k stars 65 forks source link

backend parameter not respected in Genred #248

Open Giodiro opened 2 years ago

Giodiro commented 2 years ago

Hi again,

I've encountered another small issue, I'm not sure whether it's worth fixing or not.

While running keops with the cpu backend on a machine where cuda is available, the rsqrt operation fails. Some code to reproduce the error follows:

import torch
import pykeops
from pykeops.torch import Genred

def kernel(v):
    formula = 'Rsqrt(v)'
    fn = Genred(formula, ['v=Vi(2)'], reduction_op='Sum', axis=1)
    res = fn(v, backend='CPU')
    return res

def test():
    assert torch.cuda.is_available
    v = torch.randn(100, 10)
    return kernel(v)

if __name__ == "__main__":
    test()

I think I tracked it down to the codegen not having access to the chosen backend ('cpu', 'gpu_1d', etc), hence generating code for the CPU or GPU based on the global config.use_cuda variable. This is problematic for rsqrt since it uses a function only available in nvcc.

As a workaround I can just modify the global variable as well as setting the backend.

joanglaunes commented 2 years ago

Hello @Giodiro , I have done a quick fix with the use of the __CUDACC__ preprocessor macro. So the python code does write the two codes (for gpu and for cpu) and the preprocessor will choose depending on this macro. This is not very clean but I am feeling that it is better than using the config.use_cuda variable. I think the clean fix would be to add a device attribute to the c_variable and c_array python classes to specify the device (host or device), as we do for the dtype. But this would be a little more work, I will do it later maybe.