Closed hypnopump closed 1 month ago
Only if you are using b-splines of order 1
It's equivalent to activating the same hidden state with multiple activation functions and then use a wider linear transformation to shrink it back. Somewhat like Gated Linear Unit, but somewhat in reverse: linear transformation goes after the broadening activation.
This note shows equivalence of KAN to MLP in the piecewise linear approximation. I guess non-linearity of spline might help in some cases, but would be cool to have it as a baseline. Here's the reddit discussion