GistNoesis / FourierKAN

MIT License
690 stars 57 forks source link

Added smooth_initialization option to NaiveFourierKANLayer #4

Closed JeremyIV closed 4 months ago

JeremyIV commented 4 months ago

With the default initialization scheme for fouriercoeffs, all frequencies draw their coefficients from the same distribution. This means that as gridsize becomes large, there is more and more contribution from the high frequencies, making KAN's initial scalar functions very high-frequency. In these high-frequency functions, the output values for nearby inputs are uncorrelated. This means that the initial KAN function is highly "scrambled"; and cannot "unscramble" itself during training.

For example, here is a KAN with 3 layers, 10 hidden units, and a grid size of 120 trained to encode an image using the coordinate network paradigm, for example see SIREN

Target image

image

With the default initialization,

Before training:

image

after training:

image

With smooth initialization,

Before training:

image

after training: image

unrealwill commented 4 months ago

Thanks for the Pull Request. A few thoughts/notes for myself :

I am merging the pull request. I'll add some line in Readme to explain this new parameter.

One usual way of dealing with Fourier higher frequency terms, is adding a regularization term which penalize the higher frequencies in the way you want. The merit of that being that the function will be enforced smoothed as training progresses, and not just at initialization.

One thing to study is probably how well is the frequency noise type preserved or changed during training.

JeremyIV commented 4 months ago

Thanks for merging! Here are some quick sloppy experiments in response to your comments:

image

Regularization

I tried the default initialization with L2 regularization of the fourier coefficients, weighted by f^alpha, for alpha=0,0.5,1,1,5,2,2.5

image

And here is the power spectra before and after training with alpha=1.5: image

unrealwill commented 4 months ago

Thanks a lot for doing some experiments.

In the KAN paper, they mention doing their experiment with LBFGS, hinting at a second order method.

FourierKAN use cos and sin (Cinf functions), so it can probably benefit from using second order optimizer to take advantage of the curvature.

Something like hessian-free optimization (something like https://github.com/fmeirinhos/pytorch-hessianfree (author warning "Not fully tested, use with caution!") ) should do the trick, and help distinguish optimizing issues from model expressiveness.

Also standard general neural network architecture tricks like resnet, and normalization should also help.