Added smooth_initialization option to NaiveFourierKANLayer

GistNoesis / FourierKAN

MIT License

690 stars 57 forks source link

Added smooth_initialization option to NaiveFourierKANLayer #4

Closed JeremyIV closed 4 months ago

JeremyIV commented 4 months ago

With the default initialization scheme for fouriercoeffs, all frequencies draw their coefficients from the same distribution. This means that as gridsize becomes large, there is more and more contribution from the high frequencies, making KAN's initial scalar functions very high-frequency. In these high-frequency functions, the output values for nearby inputs are uncorrelated. This means that the initial KAN function is highly "scrambled"; and cannot "unscramble" itself during training.

For example, here is a KAN with 3 layers, 10 hidden units, and a grid size of 120 trained to encode an image using the coordinate network paradigm, for example see SIREN

Target image

With the default initialization,

Before training:

after training:

With smooth initialization,

Before training:

after training:

unrealwill commented 4 months ago

Thanks for the Pull Request. A few thoughts/notes for myself :

Which camel case convention policy ?
Enforcing grid_norm_factor to be a 4D tensor with shape (1,1,1,gridsize) to avoid error when shuffling dimensions
Which type of noise is best for initialization : smooth_initialization is brownian noise, maybe try various 1/f^alpha noise.
Is the unit mean and variance preserved for various type of input noises, are there missing constants ?
- Making smooth_initialization=True the default ? (breaking change for existing users vs good default for new users)

I am merging the pull request. I'll add some line in Readme to explain this new parameter.

One usual way of dealing with Fourier higher frequency terms, is adding a regularization term which penalize the higher frequencies in the way you want. The merit of that being that the function will be enforced smoothed as training progresses, and not just at initialization.

One thing to study is probably how well is the frequency noise type preserved or changed during training.

JeremyIV commented 4 months ago

Thanks for merging! Here are some quick sloppy experiments in response to your comments:

re: different noise spectra, here's a tiny hyperparameter sweep of different alpha values for my coordinate network problem, which suggests alpha=1.5 may be slightly better than 2, but many more experiments would be needed to make a conclusive choice:

For the current smooth initialization with alpha=2, you can verify experimentally that the initial layer appears to preserve mean 0 variance 1. I have not tried to confirm this mathematically.
With smooth_initialization, for any alpha value, the spectral power density does not appear to change much during training (initial and final plots overlap):
Whereas With the default initialization, the last layer learns to reduce the power of the high frequencies, but the previous layers do not:

Regularization

I tried the default initialization with L2 regularization of the fourier coefficients, weighted by f^alpha, for alpha=0,0.5,1,1,5,2,2.5

And here is the power spectra before and after training with alpha=1.5:

unrealwill commented 4 months ago

Thanks a lot for doing some experiments.

In the KAN paper, they mention doing their experiment with LBFGS, hinting at a second order method.

FourierKAN use cos and sin (Cinf functions), so it can probably benefit from using second order optimizer to take advantage of the curvature.

Something like hessian-free optimization (something like https://github.com/fmeirinhos/pytorch-hessianfree (author warning "Not fully tested, use with caution!") ) should do the trick, and help distinguish optimizing issues from model expressiveness.

Also standard general neural network architecture tricks like resnet, and normalization should also help.