Closed K-H-Ismail closed 6 months ago
Great find! I'll be closing my pull request
Thanks ! You actually did the whole job !
Just tested it and this change makes ChebyKAN ~11x faster
Thats indeed a giant boost! Really appreciate that! I'll reserve the unvectorized one in another file for a quick and brief understanding.
Many thanks to you for helping demystify KANs, Many thanks to @iiisak for thinking about the trigonometric formulation Actually, the formula extends for all x in R that way: or simply this one
We could implement this and get rid of the tanh in the beginning.
I'm afraid this will add nothing. As far as I understood from your discussion with @JanRocketMan Issue:3, all polynomial interpolations that could be used yield a form of LANs or GLUs.
Using a B-spline as in the original paper is just using a piece-wise polynomial interpolation that has nicer properties instead.
What do you think about this?
Many thanks to you for helping demystify KANs, Many thanks to @iiisak for thinking about the trigonometric formulation Actually, the formula extends for all x in R that way: or simply this one
We could implement this and get rid of the tanh in the beginning.
I'm afraid this will add nothing. As far as I understood from your discussion with @JanRocketMan Issue:3, all polynomial interpolations that could be used yield a form of LANs or GLUs.
Using a B-spline as in the original paper is just using a piece-wise polynomial interpolation that has nicer properties instead.
What do you think about this?
Sorry for the late reply, ive been dealing with a fever.
I definitely agree. We've find some major advantages of B-splines (actually drawbacks of chebyKAN) in the discussion: B-spline is less equal (though in theory its expandable) to LAN B-spilne is capable of continual learning (#8) B-spline is also more universal (not 100% sure)
In my opinion: yeah B-spline is also just a nicer interpolation. But its good enough in some specific problems. I see people have find many ways to accelerate the original KAN. KAN might be a great tool in those problems. But its too early to replace every MLP with KAN. i havn't see its advantage in larger problems like CV/NLP.
@SynodicMonth Yes indeed, thank you for your insights.
I have read through your discussion and code. I have a question: you use cos(n * acos(x)) to replace the iteration of T_n(x). We know that trigonometric functions like sin and cos internally use approximation methods such as Taylor expansion, which essentially involve loops and piecewise processing. Why is the acos + cos approach faster?
"B-spilne is capable of continual learning"
I just believe that different approximation methods have different advantages. In fact, for high-frequency trigonometric functions, like sin(999x), there are very few methods that can approximate them quickly (using the raw algorithm(+,x,...no approximation), without defining it a priori as sin(exp(THETA_learnable)*x))
I read a paper 'A note on computing with Kolmogorov Superpositions without iterations' (David Sprecher @UCSB @2021)
‘without iterations’, it sounds impressive! the network should be wider.
Hello,
Regarding the cos implementation. You could go for a polynomial Taylor expansion. Given that one can implement it fastly, there will be no difference in time.
Regarding Sprecher's work. We have been looking at that in detail these last few months. Sprecher's inner functions are fractal and cause some issues for differentiable learning. However, we are investigating some encouraging avenues. Depending on the result, We might share some of our preliminary work in a preprint.
great! @K-H-Ismail