getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.04k stars 64 forks source link

At least two input variables have different memory locations (Cpu/Gpu) #306

Open AdrienWohrer opened 1 year ago

AdrienWohrer commented 1 year ago

Hi. Attached below is a minimal working example of my problem. The following code works under Cpu, but fails under Gpu :

import torch
from pykeops.torch import Vi, Vj, Pm

# Choose device (CPU or GPU)
torchdevice = torch.device("cuda:0") if torch.cuda.is_available() else "cpu"

sigma = 2.0   # Gaussian kernel scale
# sigma = torch.tensor(sigma, device=torchdevice)  # uncommenting this line suppresses the error
D = 3         # Data dimension

# KeOps version : symbolic lazytensor formulas
x, y = Vi(0, D), Vj(1, D)
sig = Pm(sigma)
K = (-(x.sqdist(y)) / (2 * sig ** 2)).exp()
MyRed_keops = K.sum_reduction(axis=1)

# Testing
M, N = 100, 1000
xt = torch.randn(M, D).to(device=torchdevice)
yt = torch.randn(N, D).to(device=torchdevice)
print(MyRed_keops(xt,yt)[:5])       # produces an error on GPU

Last line raises the PyKeops error : "At least two input variables have different memory locations (Cpu/Gpu)"

I assume the problem lies in my handling of parameter sigma, which is in some sense "attached" (insert correct term here!) by KeOps to the Cpu, and hence produces a device clash when xt and yt are located on the Gpu.

I can suppress the error by explicitly converting sigma into a torch tensor stored on the Gpu (line 8 in the above code). However, this is not a perfect solution for me, because then the symbolic reduction formula MyRed_keops can only be used on tensors lying on the Gpu. In my real usage case, MyRed_keops should ideally be agnostic to the type of tensor it receives.

Is this a hard limitation of KeOps in its current form ? Is there any nicer workaround than my line 8 ?