Open wkqian06 opened 5 months ago
hi, a few observations look suspicious:
(1) train loss remains the same for different grid
is a bit suspicious.
(2) some activation functions looks quite oscillatory for the final plot
I don't really have any good advice off the top of my head, but might worth trying other models (e.g., MLPs) to benchmark the complexity of your dataset.
Hi, thanks for sharing the thoughts. I'll look at the benchmarks for this dataset.
I don't really have any good advice off the top of my head, but might worth trying other models (e.g., MLPs) to benchmark the complexity of your dataset.
I just tried MLP for the dataset, the MSE loss was way smaller than the loss from KAN, which is acceptable. Something weird was the training process seemed to fail on my dataset especially when looking at the estimations and the labels.
interesting! just realized that you set lamb=0.1
which might be too high, please try what happens with lamb=0.0
.
interesting! just realized that you set
lamb=0.1
which might be too high, please try what happens withlamb=0.0
.
Still, it doesn't work. As shown at the beginning, changing arguments did not improve the training process. It seems that the whole training process failed. Don't know why the predicted labels always tend to converge as the true labels increase. The following is the plot under the lamb=0.0 setting. While for MLP, the plot would be a nice diagonal line.
very interesting. A quick observation is that true labels have wide large range. this could possibly lead to KAN failing because KANs are by default using uniform grids. To test this hypothesis, could try to snap examples with small labels and see if this helps.
Could also try KAN(..., grid_eps=0.02, ...)
which changes KANs to adaptive grids based on sample distribution.
Hi! Were you able to make it work? I'm facing a similar issue as you currently, and grid_eps didn't work as well.
Could also try
KAN(..., grid_eps=0.02, ...)
which changes KANs to adaptive grids based on sample distribution.Hi! Were you able to make it work? I'm facing a similar issue as you currently, and grid_eps didn't work as well.
Sorry for the late update, _grideps did not work in my case. Something interesting is that though the original and normalized dataset did not work, the minmax scaled dataset seemed to work to some extent. I'm confused why minmax scaling was the best strategy in this case especially when the adaptive grids did not work.
That does sound interesting! However, in my case, MinMax, or MaxAbsScaler did not work. Did you do any other preprocessing apart from using MinMax? It is strange because in many other cases, KAN did seem to predict equally well if not better than MLP, but for some instances, it just doesn't seem to improve that much(training loss not changing being the issue)
That does sound interesting! However, in my case, MinMax, or MaxAbsScaler did not work. Did you do any other preprocessing apart from using MinMax? It is strange because in many other cases, KAN did seem to predict equally well if not better than MLP, but for some instances, it just doesn't seem to improve that much(training loss not changing being the issue)
No, I only did MinMax scaling. Weighted loss works for this imbalanced dataset to reduce the MAE. But I would say, even though the final plots looked better, I still did not consider it a success because the patterns where the observations were lower than 20 were still strange.
I wonder why the training process stops that fast in my case (training loss not changing), ending up with a bad performance.
I trained the following model using inputs with [1600,4] obtained from observations
But there was no change in the loss seed = 0, lamb=0.1, lr = 0.1
Normalization, optimisor, and different arguments, such as seed, lr, and lamb, did not work. eg. seed = 1253, lamb=1, lr = 1
seed = 1253, lamb=1, lr = 0.01
seed = 1253, lamb=0, lr = 0.01
seed = 1253, lamb=0, lr = 0.01, width = [4,2,2,1]
More information may be needed:
model.plot
with 'Adam', lr=0.01,lamb=0.1. More iterations did not improve loss, so I set it to 20.Not sure if it is because of the modeling setting, the noise from the observation, or other reasons. Or if it is the best loss the model can get.