KindXiaoming / pykan

Kolmogorov Arnold Networks
MIT License
15.12k stars 1.4k forks source link

Will pruning happen automatically? #450

Open lyenthu opened 2 months ago

lyenthu commented 2 months ago

Hello!

I'm designing a model for time series data prediction, and I haven't executed any specific pruning commands. After several epochs, I used the command "model.fc_kan.plot(beta=100); plt.show();" and observed the results as shown in the figure. In the first layer, there are usually three to four nodes without any edge connections. Is this because the kan network is identifying the importance of features or automatically performing pruning?(I haven't used kan.train() for training)

Another question is that in the figure, each edge appears to have a plot that is either exactly or close to y=x or y=-x. 1726219840791 1726220994681 1726221050476

KindXiaoming commented 2 months ago

sometimes pruning happens automatically if pruning helps lowering losses.

A simple example: https://github.com/KindXiaoming/pykan/blob/master/tutorials/Interp/Interp_4_feature_attribution.ipynb. I used lamb=0.001 here, but you could try lamb = 0. and I guess pruning may still happen (not as clean as lamb=0.001), but at least the fourth input neuron would become somewhat disconnected because the feature is irrelevant.

Regarding the linear activation: did you use model.update_grid() in training every few steps? If not, it's possible that activations get out of the default range [-1,1] where splines are active, hence these edges behave like silu (approximately linear for x>0)

lyenthu commented 1 month ago

@KindXiaoming Thank you very much for your insightful responses; your explanations and suggestions have been incredibly helpful and have yielded certain positive outcomes in my work. However, I still have some questions: I designed an LSTM+Kan network combination model for my time series task, and it has produced promising results. My aim is to utilize the Kan network to conduct an interpretability study on my current task. However, I am uncertain about how to start the interpretability analysis. Below is the proxy model designed during the feature selection process.

As shown in the diagram, I have inputted 20 features: (1)In the first layer, I applied sparsification. Based on the diagram, can it be inferred that most of the features are independent of each other? In other words, can nodes without connections be interpreted as unimportant features? (2)The first feature on the left of the first layer has no further connections. Does this imply that the feature does not contribute to the model? (3)Can the lighter-colored connections indicate lower weights? (4)In the second and third layers, the original features have already been transformed and no longer correspond to the initial feature values. How should the structure be interpreted in this case? How can we interpret the significance of features throughout the transformation process?

I hope to receive your insights. My description might not be entirely clear, but overall, I am interested in understanding how to fully leverage the Kan network to conduct an interpretability study for my task. image image