Closed Enqiliu125 closed 3 months ago
Hi, there could be multiple possibilities:
In which training step do you see this error, step 0 or later? If step 0, (1) is more likely. If after a while, (2) is more likely. You may also try to change the driver argument in torch.lstsq, but I don't have a systematic suggestion. You may try all of them. :->
I modified the input to 1D inputs, 1D output, and 5 hidden neurons. Despite the lamb
value not being large, I am still facing this issue. My goal is to perform a fit similar to the Arrhenius equation for a single-variable complex formula:y=Ax^nexp(-E/R/x). However, when the input is a single variable, I encounter this problem. Additionally, I have formatted my entire code according to the tutorial's instructions. Could you please help me understand the reason behind this issue and how to resolve it?
my code is as following:
from kan import *
import numpy as np
# create a KAN: 1D inputs, 1D output, and 5 hidden neurons. cubic spline (k=3), 5 grid intervals (grid=5).
model = KAN(width=[1,5,1], grid=5, k=3, seed=0,device='cpu')
A = 3.55*10**15
n = -0.41
E = 16.6 # J/mol
R = 8.314 # J/(mol*K)
# create dataset f(x) = A*T^n*exp(-E/(R*T))
x=np.linspace(250, 1250, 1000)
f = lambda x: torch.exp(-E/R/x)*A*x**n
dataset = create_dataset(f, n_var=1)
dataset['train_input'].shape, dataset['train_label'].shape
# train the model
model.train(dataset, opt="LBFGS", steps=20, lamb=0.01, lamb_entropy=10.);
model = model.prune()
model(dataset['train_input'])
model.plot()
Here is the error:
runfile('C:/Users/26060/Desktop/kan/kan_try.py', wdir='C:/Users/26060/Desktop/kan')
train loss: nan | test loss: nan | reg: nan : 25%|████▌ | 5/20 [00:03<00:11, 1.32it/s]
Traceback (most recent call last):
File D:\miniconda\Lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
exec(code, globals, locals)
File c:\users\26060\desktop\kan\kan_try.py:20
model.train(dataset, opt="LBFGS", steps=20, lamb=0.01, lamb_entropy=10.);
File D:\miniconda\Lib\site-packages\kan\KAN.py:898 in train
self.update_grid_from_samples(dataset['train_input'][train_id].to(device))
File D:\miniconda\Lib\site-packages\kan\KAN.py:244 in update_grid_from_samples
self.act_fun[l].update_grid_from_samples(self.acts[l])
File D:\miniconda\Lib\site-packages\kan\KANLayer.py:218 in update_grid_from_samples
self.coef.data = curve2coef(x_pos, y_eval, self.grid, self.k, device=self.device)
File D:\miniconda\Lib\site-packages\kan\spline.py:137 in curve2coef
coef = torch.linalg.lstsq(mat.to('cpu'), y_eval.unsqueeze(dim=2).to('cpu')).solution[:, :, 0] # sometimes 'cuda' version may diverge
RuntimeError: false INTERNAL ASSERT FAILED at "..\\aten\\src\\ATen\\native\\BatchLinearAlgebra.cpp":1538, please report a bug to PyTorch. torch.linalg.lstsq: (Batch element 0): Argument 6 has illegal value. Most certainly there is a bug in the implementation calling the backend library.
Intel MKL ERROR: Parameter 6 was incorrect on entry to SGELSY.
Intel MKL ERROR: Parameter 6 was incorrect on entry to SGELSY.
Intel MKL ERROR: Parameter 6 was incorrect on entry to SGELSY.
Intel MKL ERROR: Parameter 6 was incorrect on entry to SGELSY.
Intel MKL ERROR: Parameter 6 was incorrect on entry to SGELSY.
Hi, have you solved this problem?
yes,I have worked it out!
Hi, I am facing the same issue. Can you please explain how did you solve the error?
Dear Everyone,
I am writing to report a bug I encountered while running symbolic regression using KAN. The issue arose when I adjusted the inputs to 1D inputs, 1D output, and 5 hidden neurons. During the computation, I encountered the following error message: false INTERNAL ASSERT FAILED at "..\aten\src\ATen\native\BatchLinearAlgebra.cpp":1538, please report a bug to PyTorch. torch.linalg.lstsq: (Batch element 0): Argument 6 has illegal value. Most certainly there is a bug in the implementation calling the backend library. It seems that there might be an issue with the backend library implementation in PyTorch.