KindXiaoming / pykan

Kolmogorov Arnold Networks
MIT License
13.63k stars 1.2k forks source link

Misleading Results for Knot Theory Invariants Data #249

Open RussWolfinger opened 1 month ago

RussWolfinger commented 1 month ago

Thanks for the great work on KAN and for sparking such interest! I haven't seen anyone report this issue but may have missed it:

The MLP training method from the Knot Theory Invariant Colab notebook has a silly early stopping rule that prematurely stops training at ~80% test accuracy (using a 12.5% single holdout test set) for signature and this is reported and supposedly improved upon in the KAN paper in Table 3. By letting the training go for 20 full epochs, MLP achieves ~96% test set accuracy.

For comparison, the best I have been able to achieve so far with KAN in 20 epochs is ~95% test set accuracy, using a [17, 8, 8, 14] model.

I am seeking practical, compelling, real-world datasets where KAN outperforms MLP in terms of some form of honest validation (single holdout or k-fold). If anyone is aware of such examples, I would be much obliged if you could point me to them.

BTW, I found this while testing a KAN implementation in C++ using LibTorch, and am planning to include it as a part of the Torch Deep Learning add-in for JMP Pro.

KindXiaoming commented 1 month ago

Thank you for reporting this! I think for the knot example, there is a clear trade-off between accuracy (larger model, train longer) and interpretability (smaller model, train less). Your great work seems to be pushing the accuracy front, while there are also people caring about the interpretability font, including us and the original deepmind paper. I do think it's valuable to pursue a more detailed study like you have done, to make hidden assumptions/goals more explicit. These hidden goals bias our choice of hyperparameters.