KindXiaoming / pykan

Kolmogorov Arnold Networks
MIT License
14.89k stars 1.37k forks source link

P10 Toy example: when the grid line reaches 200, it is inconsistent with the results of the paper. #201

Closed wdy321 closed 3 months ago

wdy321 commented 5 months ago

This is the code I changed based on example1. I just increased the number of grids in the list. When the number of grids was updated to 200, the loss suddenly became larger and jitter occurred. I don’t know what went wrong.

import sys 
sys.path.append("..")
from kan import *

# initialize KAN with G=3
model = KAN(width=[2,1,1], grid=3, k=3)

# create dataset
f = lambda x: torch.exp(torch.sin(torch.pi*x[:,[0]]) + x[:,[1]]**2)
dataset = create_dataset(f, n_var=2)

grids = np.array([3,5,10,20,50,100,200,500,1000])

train_losses = []
test_losses = []
steps = 200
k = 3

for i in range(grids.shape[0]):
    if i == 0:
        model = KAN(width=[2,1,1], grid=grids[i], k=k)
    if i != 0:
        model = KAN(width=[2,1,1], grid=grids[i], k=k).initialize_from_another_model(model, dataset['train_input'])
    results = model.train(dataset, opt="LBFGS", steps=steps, stop_grid_update_step=30)
    train_losses += results['train_loss']
    test_losses += results['test_loss']

The following are the training results

train loss: 1.42e-02 | test loss: 1.49e-02 | reg: 3.02e+00 : 100%|█| 200/200 [00:25<00:00,  7.71it/s
train loss: 6.42e-03 | test loss: 6.57e-03 | reg: 2.97e+00 : 100%|█| 200/200 [00:21<00:00,  9.12it/s
train loss: 2.91e-04 | test loss: 3.35e-04 | reg: 2.97e+00 : 100%|█| 200/200 [00:20<00:00,  9.57it/s
train loss: 2.21e-05 | test loss: 2.31e-05 | reg: 2.97e+00 : 100%|█| 200/200 [00:19<00:00, 10.15it/s
train loss: 7.55e-06 | test loss: 1.56e-05 | reg: 2.97e+00 : 100%|█| 200/200 [00:22<00:00,  8.82it/s
train loss: 5.01e-06 | test loss: 1.49e-05 | reg: 2.97e+00 : 100%|█| 200/200 [00:19<00:00, 10.48it/s
train loss: 2.67e-02 | test loss: 2.11e-01 | reg: 2.96e+00 : 100%|█| 200/200 [01:21<00:00,  2.46it/s
train loss: 2.53e-02 | test loss: 4.15e-01 | reg: 3.61e+00 : 100%|█| 200/200 [01:32<00:00,  2.17it/s
train loss: 1.89e-01 | test loss: 1.38e+00 | reg: 4.19e+00 : 100%|█| 200/200 [02:11<00:00,  1.52it/s

image

iiisak commented 5 months ago

See Figure 2.3 and Section 2.4 (Toy Example) in the paper

wdy321 commented 5 months ago

@iiisak I want to reproduce the results of this part, but the results I ran are as above. Can you tell me what the problem is?

KindXiaoming commented 5 months ago

Hi, at high precision, the results can be quite sensitive to random seeds. At least when I made the plot, noise_scale_base=0.0 is used by default, and the default now becomes noise_scale_base=0.1. Please try if model = KAN(width=[2,1,1], grid=3, k=3, noise_scale_base=0.0) helps. You may also try different random seeds to see how random seeds may affect results, e.g., using model = KAN(width=[2,1,1], grid=3, k=3, noise_scale_base=0.0, seed=42). Also stop_grid_update_step=50 is used by default, and you are using 30. Overall, my feeling is that since new changes are happening very fast, I think exactly reproducible is hard but you'd get something similar.