Blealtan / efficient-kan

An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).
MIT License
4.11k stars 361 forks source link

Can still plot model and prune model in efficient kan? #29

Open yhaddbb opened 6 months ago

yhaddbb commented 6 months ago

I notice that most image classification tasks are based on efficient-kan instead of original kan. I want to know if it is possible to plot and prune the efficient-kan just like the examples in original kan.

Blealtan commented 6 months ago

Plotting is definitely possible but not planned yet. The parameters in spline_weight is the coefficients of the splines, for now dumping that out and you will be able to plot the grid on your own.

For the pruning, it still needs to validate if my "efficient" way of doing sparsifying regularization works as expected. If not, the trained network might be badly redundant, unlike the cases in original paper.

yhaddbb commented 6 months ago

Plotting is definitely possible but not planned yet. The parameters in spline_weight is the coefficients of the splines, for now dumping that out and you will be able to plot the grid on your own.

For the pruning, it still needs to validate if my "efficient" way of doing sparsifying regularization works as expected. If not, the trained network might be badly redundant, unlike the cases in original paper.

Thank you for your reply

477810383 commented 6 months ago

Plotting is definitely possible but not planned yet. The parameters in spline_weight is the coefficients of the splines, for now dumping that out and you will be able to plot the grid on your own.

For the pruning, it still needs to validate if my "efficient" way of doing sparsifying regularization works as expected. If not, the trained network might be badly redundant, unlike the cases in original paper.

Specifically, how should we use the parameter ‘spline_weight’ to draw the shape of the activation function on each edge?

FakeEnd commented 6 months ago

I write some code to visualize the activation function on each edge, but it seems not right. If somebody knows how to modify it, welcome to chat. 😄

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import BSpline

def visualize_kan(weight):
    # define B-spline parameters
    grid_size = 5
    spline_order = 3
    weights = weight

    # define knot vector
    knot_vector = np.concatenate(([-1] * spline_order, np.linspace(-1, 1, grid_size), [1] * spline_order))

    # define parameter range
    t = np.linspace(-1, 1, 100)

    # create B-spline object
    spline = BSpline(knot_vector, weights, spline_order)

    # calculate B-spline curve values
    spline_values = spline(t)

    # add bias
    silu = nn.SiLU()
    bias = silu(torch.tensor(t))

    spline_values = spline_values + bias.numpy()

    # plot B-spline curve
    plt.figure(figsize=(8, 6))
    plt.plot(t, spline_values, label='B-spline curve')
    plt.scatter(np.linspace(-1, 1, len(weights)), weights, color='red', label='Control points')
    plt.title('B-spline Curve')
    plt.xlabel('t')
    plt.ylabel('Value')
    plt.legend()
    plt.grid(True)
    plt.show()

for layer in kan_model.layers:
    for i in range(5):
        for j in range(2):
            visualize_kan(layer.spline_weight[i][j].detach().numpy())
link24tech commented 6 months ago

mark

huangst21 commented 6 months ago

I write some code to visualize the activation function on each edge, but it seems not right. If somebody knows how to modify it, welcome to chat. 😄

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import BSpline

def visualize_kan(weight):
    # define B-spline parameters
    grid_size = 5
    spline_order = 3
    weights = weight

    # define knot vector
    knot_vector = np.concatenate(([-1] * spline_order, np.linspace(-1, 1, grid_size), [1] * spline_order))

    # define parameter range
    t = np.linspace(-1, 1, 100)

    # create B-spline object
    spline = BSpline(knot_vector, weights, spline_order)

    # calculate B-spline curve values
    spline_values = spline(t)

    # add bias
    silu = nn.SiLU()
    bias = silu(torch.tensor(t))

    spline_values = spline_values + bias.numpy()

    # plot B-spline curve
    plt.figure(figsize=(8, 6))
    plt.plot(t, spline_values, label='B-spline curve')
    plt.scatter(np.linspace(-1, 1, len(weights)), weights, color='red', label='Control points')
    plt.title('B-spline Curve')
    plt.xlabel('t')
    plt.ylabel('Value')
    plt.legend()
    plt.grid(True)
    plt.show()

for layer in kan_model.layers:
    for i in range(5):
        for j in range(2):
            visualize_kan(layer.spline_weight[i][j].detach().numpy())

HI, I used your code and it feels pretty good. Can you say what is incorrect?

phrasenmaeher commented 6 months ago

I write some code to visualize the activation function on each edge, but it seems not right. If somebody knows how to modify it, welcome to chat. 😄

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import BSpline

def visualize_kan(weight):
    # define B-spline parameters
    grid_size = 5
    spline_order = 3
    weights = weight

    # define knot vector
    knot_vector = np.concatenate(([-1] * spline_order, np.linspace(-1, 1, grid_size), [1] * spline_order))

    # define parameter range
    t = np.linspace(-1, 1, 100)

    # create B-spline object
    spline = BSpline(knot_vector, weights, spline_order)

    # calculate B-spline curve values
    spline_values = spline(t)

    # add bias
    silu = nn.SiLU()
    bias = silu(torch.tensor(t))

    spline_values = spline_values + bias.numpy()

    # plot B-spline curve
    plt.figure(figsize=(8, 6))
    plt.plot(t, spline_values, label='B-spline curve')
    plt.scatter(np.linspace(-1, 1, len(weights)), weights, color='red', label='Control points')
    plt.title('B-spline Curve')
    plt.xlabel('t')
    plt.ylabel('Value')
    plt.legend()
    plt.grid(True)
    plt.show()

for layer in kan_model.layers:
    for i in range(5):
        for j in range(2):
            visualize_kan(layer.spline_weight[i][j].detach().numpy())

HI, I used your code and it feels pretty good. Can you say what is incorrect?

I think the issue is that we only visualize the spline coefficients, but in the original implementation they visualize the activation function based on the pre- and post-activations (see also here) Can we access these two activation types in your code?

huangst21 commented 6 months ago

I write some code to visualize the activation function on each edge, but it seems not right. If somebody knows how to modify it, welcome to chat. 😄

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import BSpline

def visualize_kan(weight):
    # define B-spline parameters
    grid_size = 5
    spline_order = 3
    weights = weight

    # define knot vector
    knot_vector = np.concatenate(([-1] * spline_order, np.linspace(-1, 1, grid_size), [1] * spline_order))

    # define parameter range
    t = np.linspace(-1, 1, 100)

    # create B-spline object
    spline = BSpline(knot_vector, weights, spline_order)

    # calculate B-spline curve values
    spline_values = spline(t)

    # add bias
    silu = nn.SiLU()
    bias = silu(torch.tensor(t))

    spline_values = spline_values + bias.numpy()

    # plot B-spline curve
    plt.figure(figsize=(8, 6))
    plt.plot(t, spline_values, label='B-spline curve')
    plt.scatter(np.linspace(-1, 1, len(weights)), weights, color='red', label='Control points')
    plt.title('B-spline Curve')
    plt.xlabel('t')
    plt.ylabel('Value')
    plt.legend()
    plt.grid(True)
    plt.show()

for layer in kan_model.layers:
    for i in range(5):
        for j in range(2):
            visualize_kan(layer.spline_weight[i][j].detach().numpy())

HI, I used your code and it feels pretty good. Can you say what is incorrect?

I think the issue is that we only visualize the spline coefficients, but in the original implementation they visualize the activation function based on the pre- and post-activations (see also here) Can we access these two activation types in your code?

After a careful reading of pykan's code, I realized that perhaps EffiecientKAN is difficult to visualize as well as the native pykan. EffiecientKAN actually weights the results of all B-spline functions for all nodes to produce output directly, in order to speed up efficiency. This means that the spline_weight[i][j] in your code does not represent the spline coefficients for the [i][j]th node as native pykan does, and therefore you cannot plot the spline function directly with spline_weight[i][j]. I think EffiecientKAN perhaps speeds up the efficiency while reducing the interpretability of the model. If you know how to visualize on EffiecientKAN, please let me know.

FakeEnd commented 5 months ago

I write some code to visualize the activation function on each edge, but it seems not right. If somebody knows how to modify it, welcome to chat. 😄

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import BSpline

def visualize_kan(weight):
    # define B-spline parameters
    grid_size = 5
    spline_order = 3
    weights = weight

    # define knot vector
    knot_vector = np.concatenate(([-1] * spline_order, np.linspace(-1, 1, grid_size), [1] * spline_order))

    # define parameter range
    t = np.linspace(-1, 1, 100)

    # create B-spline object
    spline = BSpline(knot_vector, weights, spline_order)

    # calculate B-spline curve values
    spline_values = spline(t)

    # add bias
    silu = nn.SiLU()
    bias = silu(torch.tensor(t))

    spline_values = spline_values + bias.numpy()

    # plot B-spline curve
    plt.figure(figsize=(8, 6))
    plt.plot(t, spline_values, label='B-spline curve')
    plt.scatter(np.linspace(-1, 1, len(weights)), weights, color='red', label='Control points')
    plt.title('B-spline Curve')
    plt.xlabel('t')
    plt.ylabel('Value')
    plt.legend()
    plt.grid(True)
    plt.show()

for layer in kan_model.layers:
    for i in range(5):
        for j in range(2):
            visualize_kan(layer.spline_weight[i][j].detach().numpy())

HI, I used your code and it feels pretty good. Can you say what is incorrect?

I think the issue is that we only visualize the spline coefficients, but in the original implementation they visualize the activation function based on the pre- and post-activations (see also here) Can we access these two activation types in your code?

After a careful reading of pykan's code, I realized that perhaps EffiecientKAN is difficult to visualize as well as the native pykan. EffiecientKAN actually weights the results of all B-spline functions for all nodes to produce output directly, in order to speed up efficiency. This means that the spline_weight[i][j] in your code does not represent the spline coefficients for the [i][j]th node as native pykan does, and therefore you cannot plot the spline function directly with spline_weight[i][j]. I think EffiecientKAN perhaps speeds up the efficiency while reducing the interpretability of the model. If you know how to visualize on EffiecientKAN, please let me know.

Yes, you are right. I also found that the spline_weight[i][j] in EffiecientKAN not represent the spline coefficients for the [i][j]th node as native pykan does. Now, I use some tricks to visualize activation function. I just direct compute the output of the kan using range from (-1, 1).

t = torch.arange(-2, 2, 0.01).cuda()
fig, ax = plt.subplots(1, 1, figsize=(10, 5))
plt.plot(t.detach().cpu().numpy(), net.kan_layer(t.unsqueeze(1)).detach().cpu().numpy(), label="KAN layer", color="red")
plt.ylabel("Output")

If you have multi-layers, you can direct choose which layer you want to visualize:

kan_model = KAN([2, 5, 1], base_activation=nn.Identity)
# define layer
layer = 0
# define which input
input_node = 0
# define hidden node
hidden_node = 5
t = torch.arange(-2, 2, 0.01).cuda()
fig, ax = plt.subplots(1, 1, figsize=(10, 5))
plt.plot(t.detach().cpu().numpy(), kan_model.layers[layer](t.unsqueeze(1))[hidden_node][input_node].detach().cpu().numpy(), label=f"KAN layer {layer} {input_node} {hidden_node}", color="red")
plt.ylabel("Output")
yyugogogo commented 5 months ago

mark

huangst21 commented 5 months ago

我编写了一些代码来可视化每个边上的激活函数,但似乎不正确。如果有人知道如何修改它,欢迎聊天。😄

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import BSpline

def visualize_kan(weight):
    # define B-spline parameters
    grid_size = 5
    spline_order = 3
    weights = weight

    # define knot vector
    knot_vector = np.concatenate(([-1] * spline_order, np.linspace(-1, 1, grid_size), [1] * spline_order))

    # define parameter range
    t = np.linspace(-1, 1, 100)

    # create B-spline object
    spline = BSpline(knot_vector, weights, spline_order)

    # calculate B-spline curve values
    spline_values = spline(t)

    # add bias
    silu = nn.SiLU()
    bias = silu(torch.tensor(t))

    spline_values = spline_values + bias.numpy()

    # plot B-spline curve
    plt.figure(figsize=(8, 6))
    plt.plot(t, spline_values, label='B-spline curve')
    plt.scatter(np.linspace(-1, 1, len(weights)), weights, color='red', label='Control points')
    plt.title('B-spline Curve')
    plt.xlabel('t')
    plt.ylabel('Value')
    plt.legend()
    plt.grid(True)
    plt.show()

for layer in kan_model.layers:
    for i in range(5):
        for j in range(2):
            visualize_kan(layer.spline_weight[i][j].detach().numpy())

嗨,我用了你的代码,感觉还不错。你能说出什么是不正确的吗?

我认为问题是我们只可视化样条系数,但在原始实现中,它们根据激活前和激活后可视化激活函数(另请参阅此处)我们可以在您的代码中访问这两种激活类型吗?

在仔细阅读了 pykan 的代码后,我意识到 EffiecientKAN 可能很难像原生 pykan 那样可视化。EffiecientKAN 实际上对所有节点的所有 B 样条函数的结果进行加权,以直接产生输出,以提高效率。这意味着代码中的 spline_weight[i][j] 不像本机 pykan 那样表示第 [i][j] 个节点的样条系数,因此您不能直接使用 spline_weight[i][j] 绘制样条函数。我认为 EffiecientKAN 可能会提高效率,同时降低模型的可解释性。如果您知道如何在 EffiecientKAN 上可视化,请告诉我。

是的,你是对的。我还发现 EffiecientKAN 中的 spline_weight[i][j] 并不像原生 pykan 那样表示 [i][j] 个节点的样条系数。现在,我使用一些技巧来可视化激活函数。我只是使用 (-1, 1) 的范围直接计算 kan 的输出。

t = torch.arange(-2, 2, 0.01).cuda()
fig, ax = plt.subplots(1, 1, figsize=(10, 5))
plt.plot(t.detach().cpu().numpy(), net.kan_layer(t.unsqueeze(1)).detach().cpu().numpy(), label="KAN layer", color="red")
plt.ylabel("Output")

如果有多层,则可以直接选择要可视化的图层:

kan_model = KAN([2, 5, 1], base_activation=nn.Identity)
# define layer
layer = 0
# define which input
input_node = 0
# define hidden node
hidden_node = 5
t = torch.arange(-2, 2, 0.01).cuda()
fig, ax = plt.subplots(1, 1, figsize=(10, 5))
plt.plot(t.detach().cpu().numpy(), kan_model.layers[layer](t.unsqueeze(1))[hidden_node][input_node].detach().cpu().numpy(), label=f"KAN layer {layer} {input_node} {hidden_node}", color="red")
plt.ylabel("Output")

I'm not sure that's correct. When using KAN, what we usually want to visualize is the activation function on the edge from the input node to the output node. Whereas this approach seems to visualize the function after all the activation functions are combined:”plt.plot(t.detach().cpu().numpy(), net.kan_layer(t.unsqueeze(1)).detach().cpu().numpy(), label="KAN layer", color="red")”. Also, the line of code doesn't look right: "kan_model.layerslayer[hidden_node [input_node].detach().cpu().numpy()" . The format of the output doesn't seem to be [hidden_node [input_node]

jniimi commented 2 months ago

Assuming that the model has a hidden layer [3,1] and the number of variables nvars=5, then the relationship from a specific column from_node_i to the target node target_node_j in the first hidden layer may be represented like this. I’m not too confident about this solution, but I hope this helps a bit.

nvars = 5
number_of_layer = 0
from_node_i = 2
target_node_j = 1

arr = torch.arange(-1,1,0.01)
N = arr.shape[0]
X = torch.zeros(N, nvars)
X[:,from_node_i] = arr

Y = model.layers[number_of_layer](X)
Y = Y[:, target_node_j]

plt.plot(arr.detach().numpy(), Y.detach().numpy())

Unknown-3