SynodicMonth / ChebyKAN

Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.
350 stars 36 forks source link

Vectorization of the <for loop> on Chebychev polynomial degrees #7

Closed K-H-Ismail closed 6 months ago

K-H-Ismail commented 6 months ago
iiisak commented 6 months ago

Great find! I'll be closing my pull request

K-H-Ismail commented 6 months ago

Thanks ! You actually did the whole job !

iiisak commented 6 months ago

Just tested it and this change makes ChebyKAN ~11x faster

SynodicMonth commented 6 months ago

Thats indeed a giant boost! Really appreciate that! I'll reserve the unvectorized one in another file for a quick and brief understanding.

K-H-Ismail commented 6 months ago

Many thanks to you for helping demystify KANs, Many thanks to @iiisak for thinking about the trigonometric formulation Actually, the formula extends for all x in R that way: image or simply this one image

We could implement this and get rid of the tanh in the beginning.

I'm afraid this will add nothing. As far as I understood from your discussion with @JanRocketMan Issue:3, all polynomial interpolations that could be used yield a form of LANs or GLUs.

Using a B-spline as in the original paper is just using a piece-wise polynomial interpolation that has nicer properties instead.

What do you think about this?

SynodicMonth commented 6 months ago

Many thanks to you for helping demystify KANs, Many thanks to @iiisak for thinking about the trigonometric formulation Actually, the formula extends for all x in R that way: image or simply this one image

We could implement this and get rid of the tanh in the beginning.

I'm afraid this will add nothing. As far as I understood from your discussion with @JanRocketMan Issue:3, all polynomial interpolations that could be used yield a form of LANs or GLUs.

Using a B-spline as in the original paper is just using a piece-wise polynomial interpolation that has nicer properties instead.

What do you think about this?

Sorry for the late reply, ive been dealing with a fever.

I definitely agree. We've find some major advantages of B-splines (actually drawbacks of chebyKAN) in the discussion: B-spline is less equal (though in theory its expandable) to LAN B-spilne is capable of continual learning (#8) B-spline is also more universal (not 100% sure)

In my opinion: yeah B-spline is also just a nicer interpolation. But its good enough in some specific problems. I see people have find many ways to accelerate the original KAN. KAN might be a great tool in those problems. But its too early to replace every MLP with KAN. i havn't see its advantage in larger problems like CV/NLP.

K-H-Ismail commented 6 months ago

@SynodicMonth Yes indeed, thank you for your insights.

yuedajiong commented 3 months ago

I have read through your discussion and code. I have a question: you use cos(n * acos(x)) to replace the iteration of T_n(x). We know that trigonometric functions like sin and cos internally use approximation methods such as Taylor expansion, which essentially involve loops and piecewise processing. Why is the acos + cos approach faster?

yuedajiong commented 3 months ago

"B-spilne is capable of continual learning"

I just believe that different approximation methods have different advantages. In fact, for high-frequency trigonometric functions, like sin(999x), there are very few methods that can approximate them quickly (using the raw algorithm(+,x,...no approximation), without defining it a priori as sin(exp(THETA_learnable)*x))

yuedajiong commented 3 months ago

I read a paper 'A note on computing with Kolmogorov Superpositions without iterations' (David Sprecher @UCSB @2021)

‘without iterations’, it sounds impressive! the network should be wider.

K-H-Ismail commented 3 months ago

Hello,

Regarding the cos implementation. You could go for a polynomial Taylor expansion. Given that one can implement it fastly, there will be no difference in time.

Regarding Sprecher's work. We have been looking at that in detail these last few months. Sprecher's inner functions are fractal and cause some issues for differentiable learning. However, we are investigating some encouraging avenues. Depending on the result, We might share some of our preliminary work in a preprint.

yuedajiong commented 2 months ago

great! @K-H-Ismail