KAN has more learnable parameters?

Blealtan / efficient-kan

An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).

MIT License

3.49k stars 306 forks source link

KAN has more learnable parameters? #15

Closed ningwanyi closed 1 month ago

ningwanyi commented 1 month ago

I am confused about the principle of KAN. From this implementation, KAN has more learnable parameters? It seems that the improvement of KAN lies in the learnable activation functions, thus achieving better accuracy. Does KAN have any advantage on computation and memory?

Indoxer commented 1 month ago

MLP scales O(N^2L), KAN scales O(N^2LG)

N - number of neurons in layer and layer+1 L - number of layers G - grid size

And we can scale KAN in grid size, paper says that it scales loss by G^-4 (It sounds little to good), but as I known it was tested on small datasets. So, in theory we can get higher accuracy with smaller model. I am writing CUDA implementation, and I see that memory usage can be comparable to MLP.

Blealtan commented 1 month ago

The original implementation includes G times more parameters as well. Don't be fooled by the too-complicated code.