SynodicMonth / ChebyKAN

Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.
349 stars 36 forks source link

Tanh and infinite support #2

Closed edmondja closed 6 months ago

edmondja commented 6 months ago

We can avoid using tanh if we used sine-cosine form of Fourier series instead don't you think ?

SynodicMonth commented 6 months ago

That's true. there's a repo https://github.com/GistNoesis/FourierKAN uses Fourier series. I'm kind of inspired by that. But i found it harder to train (im not sure why, maybe because its periodical or just not right hyperparams). There're also other orthogonal polynomials (Legendre, Hermite, Laguerre) might work. Maybe ill try them later.

edmondja commented 6 months ago

Amazing thanks.

edmondja commented 6 months ago

I confirm your implementation gives better results than FourierKAN (I used it with UDRL on cartpole)

SynodicMonth commented 6 months ago

I've tested that ChebyKAN without layernorm and tanh is absolutely not trainable. But with a tanh it collapse to MLP when degree=1. Maybe its not KAN works but MLP works. I'm quite curious whether KAN can outperforms MLP on UDRL tasks.

cwkx commented 6 months ago

I think this is a fairer test - https://gist.github.com/cwkx/b74c0e759471064e69436be287fe142e - it shows parameters (SIREN MLP uses less) and convergers faster. Obviously both sensitive to init at this small size.

edmondja commented 6 months ago

I saw those results, but I don't experience the same thing. For me ChebyKAN destroys everything. I experience like in the KAN paper an extremely high capacity to fit the data with only few parameters, more than 10 times more parameters (and deeper net) are needed for MLP to reach similar performance. I always had good performances with MLP compared to regular ML, besides I was top 10% on Kaggle with similar MLP code 7 years ago so I have good hopes im testing it a bit properly.

Besides very easy fitting no matter the hyperparameters I pick, the validation loss of ChebyKAN is significantly better (val logloss is about 0.67 for MLP and 0.55 almost on UDRL and cartpole). People say its because KAN is good at estimating simple functions, cartpole is not real data but a simplistic "game" after all (but then why the train loss is so high with MLP...).

This doesnt translate on RL performances though, I have similar performances, but I would blame biases of RL algorithms (like primacy bias) rather than the ML algorithm. Also someone else didn't have better results than MLP on cartpole with other types of KAN : https://github.com/riiswa/kanrl/issues/1#event-12741810791 (I prefer not to share my code because it is ugly).

SynodicMonth commented 6 months ago

I think this is a fairer test - https://gist.github.com/cwkx/b74c0e759471064e69436be287fe142e - it shows parameters (SIREN MLP uses less) and convergers faster. Obviously both sensitive to init at this small size.

That's really mindblowing. i havnt test siren yet but siren seems even better. I assume SIREN and KAN are more suitable because i use a test func with sin/cos in its fomula? (not sure). SIREN seems really interesting.

tbh they're so sensitive to initialization, i've been struggle to compare them properly. my fualt.

SynodicMonth commented 6 months ago

That's good news! I'm quite doubt about KAN's ability now after several testruns on CV/NLP. It seems only capable in simple questions/fomulas. It probably wont replace mlp but might be useful in some simple physics senarios(?)

ChebyKAN is more close to LAN than the original KAN (as pointed by #3). Maybe its easier to train using adam.