KindXiaoming / pykan

Kolmogorov Arnold Networks
MIT License
13.86k stars 1.24k forks source link

FYI: convolutional layer with KAN #145

Closed StarostinV closed 1 week ago

StarostinV commented 2 months ago

https://github.com/StarostinV/convkan

A basic yet efficient (thanks to efficient-kan) implementation of the convolutional operation with KAN implemented using F.unfold.

paulestano commented 2 months ago

Are you really sure you coded all that on your own my friend ? 😉 IMG_3922

StarostinV commented 2 months ago

I am pretty sure. Please share the link so that we can compare the implementation. Everybody would benefit from that!

EDIT: I found it, looks good! My implementation supports grouped convolutions and is tested, but otherwise it is very similar.

paulestano commented 2 months ago

Be my guest mate https://github.com/paulestano/LeKan

StarostinV commented 2 months ago

Be my guest mate https://github.com/paulestano/LeKan

I was not aware that one could use unfold as a module. However, your implementation lacks support for padding_mode and groups, and it has not been thoroughly tested. In contrast, my implementation serves as a direct replacement for Conv2d. Sharing the code for the benefit of everyone is more productive than making accusations of theft. Frankly, it's an obvious idea to implement convolution with KAN. The main reason I did it was my surprise that nobody else had done so, especially given that others have argued it's impossibe in another open issue on the package. Cheers!

paulestano commented 2 months ago

The main reason I did it was my surprise that nobody else had done so, especially given that others have argued it's impossibe in another open issue on the package

except I had done it 4 days ago and shared it on the said issue (https://github.com/KindXiaoming/pykan/issues/9#issuecomment-2097866072 ) 2 days ago… I even described it in very similar words than in your issue... How surprising is that ?

paulestano commented 2 months ago

On a more scientific note I can’t wait for you to share convincing results on cifar. Unless that thing is obvious as well 😉

StarostinV commented 2 months ago

The main reason I did it was my surprise that nobody else had done so, especially given that others have argued it's impossibe in another open issue on the package

except I had done it 4 days ago and shared it on the said issue (#9 (comment) ) 2 days ago… I even described it in very similar words than in your issue... How surprising is that ?

I meant this comment. As you can see, it was made two days ago and they also didn't know about your code. Seriously, if you think that your github account and your comment are so visible, I don't know what to say. For instance, there are dozens independent implementations of efficient kan - are you gonna accuse them of stealing ideas, too? I am trying to be polite, but it is just nonsense.

paulestano commented 2 months ago

Concurrent work happens but the phrasing as well as the timeline are unfortunate here. Everyone will make their own mind…

hesamsheikh commented 2 months ago

Could you guys please explain what do you mean by implementing "conv layer in KAN"? KAN is the equivalent of dense, conv layer is an operation defined in mathematics. How can you implement an operation in KAN and why would you do it?

it seems more plausible to replace the classification dense layers with KAN, but the feature extraction?

minh-nguyenhoang commented 2 months ago

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

hesamsheikh commented 2 months ago

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?

minh-nguyenhoang commented 2 months ago

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?

@hesamsheikh I don't say they are the same, as Linear isn't local or spatial too. But I just say that we can implement the convolution operation using a matrix multiplication operation (with some modifications on the input of course). That's what encourage us to think a way to incorporate KAN to convolution.

KAN is just rephrase the way to compute the next layer feature, instead of taking the weighted sum of the input features and then do some activation, we would take the weighted sum of the b-spline functions, which is a better way to interpret how NN works. Then instead of taking a whole input space as the potential contributors (just like a Linear layer), we instead just look at a neighborhood of features, and the way we "judge" all neighborhood is the same, then what we get should be similar to a convolution layer.

StarostinV commented 2 months ago

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?

There might be some confusion here: convolutional operations are of course not limited to the typical implementation involving an affine transformation followed by an activation function. One can choose any trainable function as a kernel to perform convolution, and KANs are no exception. When using KAN as a convolutional kernel, the convolutional operation is of course still "local". On the other hand, there is nothing inherently local about the standard kernels used in Conv layers. So it is not like we do something strange just because we can perform convolution as matrix multiplication, but we simply use KAN as a convolutional kernel.

hesamsheikh commented 2 months ago

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?

There might be some confusion here: convolutional operations are of course not limited to the typical implementation involving an affine transformation followed by an activation function. One can choose any trainable function as a kernel to perform convolution, and KANs are no exception. When using KAN as a convolutional kernel, the convolutional operation is of course still "local". On the other hand, there is nothing inherently local about the standard kernels used in Conv layers. So it is not like we do something strange just because we can perform convolution as matrix multiplication, but we simply use KAN as a convolutional kernel.

so to make sure I'm getting this right, you're using KAN as the kernel function of the convolution?

minh-nguyenhoang commented 2 months ago

@hesamsheikh Basically, you can rephrase a convolution operator as a simple matrix multiplication as in Linear layer, that's why if we could replace the normal Linear layer with KAN, so does the convolution layer.

But there's a reason we're not substituting Conv layers with Linear completely. KAN isn't local or spatial in nature, what makes them suitable to be placed instead of Conv?

There might be some confusion here: convolutional operations are of course not limited to the typical implementation involving an affine transformation followed by an activation function. One can choose any trainable function as a kernel to perform convolution, and KANs are no exception. When using KAN as a convolutional kernel, the convolutional operation is of course still "local". On the other hand, there is nothing inherently local about the standard kernels used in Conv layers. So it is not like we do something strange just because we can perform convolution as matrix multiplication, but we simply use KAN as a convolutional kernel.

so to make sure I'm getting this right, you're using KAN as the kernel function of the convolution?

Yep that's the whole idea.

XiangboGaoBarry commented 2 months ago

Hi, here I implement ConvKAN with different activation formulations with their corresponding inference time. https://github.com/XiangboGaoBarry/ConvKAN-Zoo We evaluate the result on CIFAR10 dataset.

StarostinV commented 2 months ago

Hi, here I implement ConvKAN with different activation formulations with their corresponding inference time. https://github.com/XiangboGaoBarry/ConvKAN-Zoo We evaluate the result on CIFAR10 dataset.

You can also make a pull request to add your repo to this collection of KANs https://github.com/mintisan/awesome-kan