Hi. Attached below is a minimal working example of my problem. The following code works under Cpu, but fails under Gpu :
import torch
from pykeops.torch import Vi, Vj, Pm
# Choose device (CPU or GPU)
torchdevice = torch.device("cuda:0") if torch.cuda.is_available() else "cpu"
sigma = 2.0 # Gaussian kernel scale
# sigma = torch.tensor(sigma, device=torchdevice) # uncommenting this line suppresses the error
D = 3 # Data dimension
# KeOps version : symbolic lazytensor formulas
x, y = Vi(0, D), Vj(1, D)
sig = Pm(sigma)
K = (-(x.sqdist(y)) / (2 * sig ** 2)).exp()
MyRed_keops = K.sum_reduction(axis=1)
# Testing
M, N = 100, 1000
xt = torch.randn(M, D).to(device=torchdevice)
yt = torch.randn(N, D).to(device=torchdevice)
print(MyRed_keops(xt,yt)[:5]) # produces an error on GPU
Last line raises the PyKeops error : "At least two input variables have different memory locations (Cpu/Gpu)"
I assume the problem lies in my handling of parameter sigma, which is in some sense "attached" (insert correct term here!) by KeOps to the Cpu, and hence produces a device clash when xt and yt are located on the Gpu.
I can suppress the error by explicitly converting sigma into a torch tensor stored on the Gpu (line 8 in the above code). However, this is not a perfect solution for me, because then the symbolic reduction formula MyRed_keops can only be used on tensors lying on the Gpu. In my real usage case, MyRed_keops should ideally be agnostic to the type of tensor it receives.
Is this a hard limitation of KeOps in its current form ? Is there any nicer workaround than my line 8 ?
Hi. Attached below is a minimal working example of my problem. The following code works under Cpu, but fails under Gpu :
Last line raises the PyKeops error : "At least two input variables have different memory locations (Cpu/Gpu)"
I assume the problem lies in my handling of parameter sigma, which is in some sense "attached" (insert correct term here!) by KeOps to the Cpu, and hence produces a device clash when xt and yt are located on the Gpu.
I can suppress the error by explicitly converting sigma into a torch tensor stored on the Gpu (line 8 in the above code). However, this is not a perfect solution for me, because then the symbolic reduction formula MyRed_keops can only be used on tensors lying on the Gpu. In my real usage case, MyRed_keops should ideally be agnostic to the type of tensor it receives.
Is this a hard limitation of KeOps in its current form ? Is there any nicer workaround than my line 8 ?