Closed Acetylsalicylsaeure closed 5 months ago
The KAN class has a "device" argument, try passing "cuda" there. Doesnt speed things up by much though
KAN was called with device=device, device being "cuda", but the parameter device was "cpu". furthermore, i noticed .train can be called with a device argument. setting this to "cuda" also leads to the error above.
The KAN class has a "device" argument, try passing "cuda" there. Doesnt speed things up by much though
just saw your other issue, and i think it's just not offloading anything to the GPU. my system is severely CPU-bottlenecked, and setting the device to cuda does not lead to any speedup whatsoever. furthermore, there's the CPU running at 80%, compared to GPU at 0% in system monitoring
just installed from source, now setting device="cuda" instantly leads to .train failing with the initial error. model parameter device is still "cpu" however? calling .cuda() fixes that, but not the error
looks fixable, pr soon (?)
How much performance improvement do we see in GPU over in CPU? I suspect that since this is quite different from MLP architecture, the improvements come solely from parallel computing, which depends heavily on the implementation itself.
with bigger models it helps quite a lot
model = KAN(width=[4,12, 8,1], grid=10, k=3, seed=0, device=device)
f = lambda x: torch.exp((torch.sin(torch.pi*(x[:,[0]]**2+x[:,[1]]**2))+torch.sin(torch.pi*(x[:,[2]]**2+x[:,[3]]**2)))/2)
dataset = create_dataset(f, n_var=4, train_num=3000, device=device)
# train the model
#model.train(dataset, opt="LBFGS", steps=20, lamb=1e-3, lamb_entropy=2.);
model.train(dataset, opt="LBFGS", steps=50, lamb=5e-5, lamb_entropy=2., device=device)
going from half an hour to 2'40 (tqdm estimate), i.e. 10x speedup and that at 14% GPU usage, but i mentioned my CPU bottleneck
setting width=[4,2,1], CPU takes 49s, GPU 36s
fixed by #7 with setting model.to(device), closing
Yep, i ended up just moving everything to cuda manually. Also using adam as the optimizer speeds things up, but it might be less stable
replicating tutorials/API_10_device.ipynb, i see no load on the GPU, just the CPU. VRAM gets occupied, however checking the device of the dataset returns "cuda", the model parameters however return "cpu" as their device. this can be fixed by calling .to(device) on the model, but this breaks the training, leading to the following error
environment: fresh conda venv with the requirements.txt installed cuda version: 12.2
any ideas which parameter could be left behind on the CPU?