geometric-kernels / GeometricKernels

Geometric kernels on manifolds, meshes and graphs
https://geometric-kernels.github.io/
Apache License 2.0
214 stars 18 forks source link

Torch device error #147

Closed YanjunLiu2 closed 1 month ago

YanjunLiu2 commented 1 month ago

Hello,

I tried to define a graph kernel named Gkernel to be used in gpytorch, and in the GP model I made a product of the Gkernel and a RBF kernel named Fkernel, which is for the other features. When I run the code on cpu it looks fine, but when I want to turn the calculations to gpu the error shows up. I think I moved everything to the gpu so it's a bit strange.

Gkernel mu cuda:0 train_x cuda:0 train_y cuda:0 Parameter likelihood.noise_covar.raw_noise is on device: cuda:0 Parameter mean_module.constant is on device: cuda:0 Parameter Gkernel.raw_outputscale is on device: cuda:0 Parameter Gkernel.base_kernel.raw_lengthscale is on device: cuda:0 Parameter Fkernel.raw_lengthscale is on device: cuda:0 sgid cuda:0 chem cuda:0 Traceback (most recent call last): File "/data/3DSC/scripts/with_SG_feature.py", line 78, in model1, mll1 = make_and_fit_regressor_SG(train_x, train_y, Gkernel, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/3DSC/meai/GP.py", line 294, in make_and_fit_regressorSG , info_dict = fit_gpytorch_torch(model.Fkernel.lengthscale, mll, options={" maxiter": 1000, "lr": lr}) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/3DSC/meai/GP.py", line 163, in fit_gpytorch_torch output = mll.model(train_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/gpyto rch/models/exact_gp.py", line 257, in call res = super().call(inputs, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/gpyto rch/module.py", line 30, in call outputs = self.forward(inputs, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/data/3DSC/meai/GP.py", line 280, in forward covar_x = self.Gkernel(space_group_ids)self.Fkernel(chemical_features)


  File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/gpyto
rch/lazy/lazy_tensor.py", line 2259, in __mul__
    return self.mul(other)
           ^^^^^^^^^^^^^^^
  File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/gpyto
rch/lazy/lazy_tensor.py", line 1434, in mul 
    return self._mul_matrix(lazify(other))  
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  
  File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/gpyto
rch/lazy/lazy_tensor.py", line 524, in _mul_matrix
    self = self.evaluate_kernel()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/gpyto
rch/utils/memoize.py", line 59, in g
    return _add_to_cache(self, cache_name, method(self, *args, **kwargs), *args,
 kwargs_pkl=kwargs_pkl)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/gpytorch/lazy/lazy_evaluated_kernel_tensor.py", line 332, in evaluate_kernel
    res = self.kernel(
          ^^^^^^^^^^^^
  File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/gpytorch/kernels/kernel.py", line 402, in __call__
    res = lazify(super(Kernel, self).__call__(x1_, x2_, last_dim_is_batch=last_dim_is_batch, **params))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/gpytorch/module.py", line 30, in __call__
    outputs = self.forward(*inputs, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/gpytorch/kernels/scale_kernel.py", line 103, in forward
    orig_output = self.base_kernel.forward(x1, x2, diag=diag, last_dim_is_batch=last_dim_is_batch, **params)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/geometric_kernels/frontends/gpytorch.py", line 162, in forward
    return self.base_kernel.K(params, x1, x2)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/geometric_kernels/kernels/karhunen_loeve.py", line 199, in K
    weights = B.cast(B.dtype(params["nu"]), self.eigenvalues(params))  # [L, 1]
                                            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/geometric_kernels/kernels/karhunen_loeve.py", line 171, in eigenvalues
    spectral_values = self._spectrum(
                      ^^^^^^^^^^^^^^^
  File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/geometric_kernels/kernels/karhunen_loeve.py", line 117, in _spectrum
    safe_nu = B.where(nu == np.inf, B.cast(B.dtype(lengthscale), np.r_[1.0]), nu)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/plum/function.py", line 383, in __call__
    return _convert(method(*args, **kw_args), return_type)
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/lab/shape.py", line 185, in f_wrapped
    return f(*(unwrap_dimension(arg) for arg in args), **kw_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/yanjun_data/anaconda3/envs/meai/lib/python3.11/site-packages/lab/torch/generic.py", line 364, in where
    return torch.where(condition, a, b)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
stoprightthere commented 1 month ago

Hey @YanjunLiu2

Can you verify that raw_nu parameter of the kernel is also on cuda? BTW you could run kernel.to(device) to move all of the kernel's parameters to a specified device, without having to worry about it yourself.

If the above does not help, could you please provide a minimal reproducible example for this? Thank you!

vabor112 commented 1 month ago

@YanjunLiu2 did you resolve the issue by any chance?

vabor112 commented 1 month ago

Closing due to inactivity, @YanjunLiu2 feel free to reopen.