KindXiaoming / pykan

Kolmogorov Arnold Networks
MIT License
14.94k stars 1.38k forks source link

running example 10 with GPU, error: all tensors on two devices, cuda:0 and cpu! #155

Closed AlpsCV closed 5 months ago

AlpsCV commented 5 months ago

I met the issue that for GPU version the tensors are different devices, cuda:0 and cpu! The detail was shown below. description: 0%| | 0/20 [00:00<?, ?it/s] Traceback (most recent call last): File "./myProject/pykan-master/try_demo10_device.py", line 16, in model.train(dataset, opt="LBFGS", steps=20, lamb=1e-3, lamb_entropy=2.); File "./myProject/pykan-master/kan/KAN.py", line 898, in train self.update_grid_from_samples(dataset['train_input'][train_id].to(device)) File "./myProject/pykan-master/kan/KAN.py", line 243, in update_grid_from_samples self.forward(x) File "./myProject/pykan-master/kan/KAN.py", line 311, in forward x_numerical, preacts, postacts_numerical, postspline = self.act_funl File "./anaconda3/envs/kanenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "./anaconda3/envs/kanenv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, **kwargs) File "./myProject/pykan-master/kan/KANLayer.py", line 170, in forward x = torch.einsum('ij,k->ikj', x, torch.ones(self.out_dim, device=self.device)).reshape(batch, self.size).permute(1, 0) File "./anaconda3/envs/kanenv/lib/python3.9/site-packages/torch/functional.py", line 385, in einsum return _VF.einsum(equation, operands) # type: ignore[attr-defined] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

hrjtju commented 5 months ago

same issue

Betty-J commented 5 months ago

You could specify the device using the device parameter when creating the model, for example: KANLayer(in_features, cfg.MODEL.CLS_HEAD.NUM_CLASSES, device=torch.device('cuda')).

manuelcugliari commented 5 months ago

Same issue also specifying the device parameter.

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = KAN(width=[X_train_torch.shape[1], 10, 2], grid=3, k=3, device=device)

Wenbobobo commented 5 months ago

Same problem, I want to know whether the experiments of computers vision and diffusion models have a method that successfully uses the GPU, maybe i need to change kan.py?

AlessandroFlati commented 5 months ago

Can you all tell us at least the tag/commit you're using please?

AlpsCV commented 5 months ago

when update the code of https://kindxiaoming.github.io/pykan/API_demo/API_10_device.html

from model.train(dataset, opt="LBFGS", steps=50, lamb=5e-5, lamb_entropy=2.); to model.train(dataset, opt="LBFGS", steps=50, lamb=5e-5, lamb_entropy=2., device=device);

it will work for both cpu and gpu.

bozhenhhu commented 5 months ago

when update the code of https://kindxiaoming.github.io/pykan/API_demo/API_10_device.html

from model.train(dataset, opt="LBFGS", steps=50, lamb=5e-5, lamb_entropy=2.); to model.train(dataset, opt="LBFGS", steps=50, lamb=5e-5, lamb_entropy=2., device=device);

it will work for both cpu and gpu.

thans, however, I got this: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.