Are the "efficient-kan" and "official-kan" equivalent in terms of algorithms?

Blealtan / efficient-kan

An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).

MIT License

3.92k stars 349 forks source link

Are the "efficient-kan" and "official-kan" equivalent in terms of algorithms? #18

Closed yuedajiong closed 4 months ago

yuedajiong commented 4 months ago

as title

Indoxer commented 4 months ago

As I know almost the same, only official version looks to have additional bias after each layer. Also, I am not sure if initialization is the same. + regularization loss is changed because of optimizations.

yuedajiong commented 4 months ago

@Indoxer Thanks, you are so kindly.

WhatMelonGua commented 4 months ago

No, I'm not quite sure I tried the official tutorial on the following link: Tutorial

*Including the use of the official LBFGS training strategy The results showed that after completing all the one-time training, the model was almost identical to the official one But if training is conducted in phases, it cannot be perfectly fitted(But the model is still effective, just slightly underperforming) official KAN Eff-KAN

WhatMelonGua commented 4 months ago

I think this is acceptable, after all, the model is very efficient, and some losses are normal. It's strange if there are no losses at all. While it effectively retains the characteristics of the official model, it also combines training optimization

Indoxer commented 4 months ago

@WhatMelonGua, are you sure, that you didn't train spline_scaler and base_weights? Also did you have the same parameters in LBFGS optimizer (number of steps, etc.)?

Indoxer commented 4 months ago

(spline_scaler not trained, base_weights not trained) (spline_scaler trained, base_weights trained):

(I am using my modified version (but the same algorithm as efficient kan), so I am not sure)

WhatMelonGua commented 4 months ago

@WhatMelonGua, are you sure, that you didn't train spline_scaler and base_weights? Also did you have the same parameters in LBFGS optimizer (number of steps, etc.)?

Oh, yes, forgive me for forgetting There are no such parameters, so for that reg_ variable (I don't know what it is), I simply took the default value of 1 and fixed many errors (perhaps I was fixing it blindly, just making it work) And then the result was that the official "LBFGS" cannot be directly migrated here

WhatMelonGua commented 4 months ago

(spline_scaler not trained, base_weights not trained) (spline_scaler trained, base_weights trained):

(I am using my modified version (but the same algorithm as efficient kan), so I am not sure)

This may seem like our operations are similar What a coincidence! 🤗

Indoxer commented 4 months ago

@WhatMelonGua, are you sure, that you didn't train spline_scaler and base_weights? Also did you have the same parameters in LBFGS optimizer (number of steps, etc.)?

Oh, yes, forgive me for forgetting There are no such parameters, so for that reg_ variable (I don't know what it is), I simply took the default value of 1 and fixed many errors (perhaps I was fixing it blindly, just making it work) And then the result was that the official "LBFGS" cannot be directly migrated here

reg_ is regularization loss. loss = train_loss + lamb * reg_ for continual learning lamb=0.0 so loss = train_loss

Indoxer commented 4 months ago

Here are my results and code, so you can compare

Blealtan commented 4 months ago

AFAIK the only difference is that the "efficient" regularization loss is different from the official one. But I'm not sure if the parallel associativity will introduce numerical error that's large enough to break some important features.

Blealtan commented 4 months ago

Just found that I missed the bias term after each layer. Will update that soon.

I scanned over this long thread few days ago and totally missed the comment by @Indoxer lol