KindXiaoming / pykan

Kolmogorov Arnold Networks
MIT License
14.73k stars 1.35k forks source link

Functional Approximation is poor #156

Open 1ssb opened 4 months ago

1ssb commented 4 months ago

Congratulations on this great piece of work. I have tried to do simple tests like functional approximations and it turns out for a variety of models KANs are poor performing wrt standard MLPs.

It could be that my model is much simpler and therefore not capable enough, but it would be nice to see these standard comparisons because that would inform how well they can be adapted broadly. Here is my implementation: https://github.com/1ssb/torchkan/blob/main/torchkan.py

1ssb commented 4 months ago

Here is a comparison between the models (Same layer configurations -- for both I have used the same positional encodings):

W B Chart 11_5_2024, 1_18_53 AM

W B Chart 11_5_2024, 1_18_33 AM

Both of them are trying to approximate the inverse of the Gaussian function.

pop756 commented 4 months ago

Have you tried lowering the grid number to 3 or less, I had a similar issue with another issue and it improved somewhat when I lowered the grid to 2.

1ssb commented 4 months ago

Yes. I did, does not help.

Also since we are trying to move towards interpretability, let's take some additional onus on ourselves and explain why changes would make any difference anyway. Smaller grids would mean more control points right, but would that matter for a smooth function like an inverted gaussian? Please feel free to correct me.

pop756 commented 4 months ago

image image

If we infer a function such as e^(x^2+1), we divide the sequence into two parts and construct a KAN with two weight layers, x^2 + 1 and exp. If we construct a KAN with a layer such as [1,2,1], we can theoretically predict the above function correctly. However, if we use a different layer, for example, 1,3,1 or 1,3,2,1, the KAN will not make a prediction as shown below, but an additional function will be added when the input value is entered, causing an overfitting phenomenon, and I thought this phenomenon could occur.

When I configured the KAN layer as [1,5,2,1],grid=2, the result was lower than 0.00002 when training.

The MLP was configured as [1,64,64,1].

1ssb commented 4 months ago

Sorry I should have been clearer. An inverted Gaussian as in f(x) = N(u, sigma^2) then y =f^-1 is what one has access to not an explicit parameterised model as such just samples from the inverse which depending on mapping is an R->R^n function.

pop756 commented 4 months ago

I'm sorry, I don't have the math knowledge to answer that question, but when I trained the function I presented above, I left the last 20% x (i.e. x corresponding to 1.8-3) as the validation set and trained the remaining 80% x data, the results were as follows when grid =7 and when grid = 3. I didn't apply L1 regulation here. I thought it might help you with the attached question.

Grid = 7 image Grid = 3 image

1ssb commented 4 months ago

I am fundamentally unsure of what you mean by divide it into two parts or sequences, can you kindly explain?

pop756 commented 4 months ago

image This is the result of using this part as training data and image the rest of this photo as validation data. And when I trained like that, I changed only the Grid variable and plotted the learning process, I got the following result.

1ssb commented 4 months ago

The errors I got were quite low but they got saturated as in 0.003 but it didnt go below that while MLPs went to the order of 1e-6.