SynodicMonth / ChebyKAN

Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.
348 stars 36 forks source link

Check out KAL-Nets #5

Open 1ssb opened 6 months ago

1ssb commented 6 months ago

Would really love your feedback: https://github.com/1ssb/torchkan

Using Legendre Polynomials instead; ~98% on MNIST.

SynodicMonth commented 6 months ago

Wow i see the recent 99.5% on MNIST! that's really impressive! I havnt test the difference between Legendre and Chebyshev (or other polys). Also normalizing x using min-max might be better! Really appreciate that!

1ssb commented 6 months ago

If you take a look I am using monomial bases; so does this mean, MLPs have actually been doing this all along? It constitutes to blur fine lines of representation equivalencies. I am actively working on other topics and will keep updating as I keep progressing.

SynodicMonth commented 6 months ago

im also thinking about that. i think monomial bases are the same as those orthogonal polys. as mentioned in #3, without grid, KAN = LAN + custom activation func. im not 100% sure if its equal to MLP with GLU, but its similar enough. Ive tested the ChebyKAN with equivalent (same param) MLP on MNIST and no abvious advantage is observed. ChebyKAN even performs worse when the degree is high.

1ssb commented 6 months ago

I mean GLU creates a switching effect which is studied in Signal theory to have a very specific effect on transformations....but yes I agree witb the non gated custom activation, thats practically all I am doing as well. Now really the question is if that is the case could we describe them as mathematical kernel operations like a transform and inverse transforms. Because if that is the case, we can literally start treating networks with a systems approach.

SynodicMonth commented 6 months ago

(sorry im not quite familiar with GLU. I might be wrong on that. ) Im not sure if its capable. My poor math knowledge cant find a way to transform that to a kernel operation. but thats a brilliant idea. It makes me feel like it's related to some essence of MLPs.