Open CYYJL opened 1 month ago
Great work :) I have many questions about it and will try to experiment it also !
Hi, as to why KAN[64,16,10] is used instead of a linear layer, my idea is to combine ResNet and KAN to test whether it can run in a normal environment, how fast it trains, whether it can complete classification, even segmentation and detection. In my above code, the Adam optimizer is used for optimization, and the time to complete an epoch is 3s when sorting with MLP, while the time to complete an epoch with KAN[64,16,10] is 30s, and the time to complete an epoch with KAN[512,... 64,10] is 30min. In my opinion, the current KAN network of my task may be difficult to use in some large-size image tasks, such as 224x224 or larger tasks, or in Transformer, when dim is 768, kan network is difficult to train, and the training time may be particularly long. Also, thank you very much for your suggestion to use the LGBFS optimizer, which I am not familiar with at the moment and will need more in-depth research to answer your second question. Best! CYYJL
Thank you very much for your work. I followed your approach and modified my ResNet network by directly replacing the linear layers in the final normalization part with KAN layers. However, I found that the accuracy decreased instead. I wonder if you have experienced the same issue.
My original FcBlock (it is used at the end of the ResNet backbone for output):
class FcBlock(nn.Module): def init(self, in_channel, out_channel, in_dim): super(FcBlock, self).init() self.in_channel = in_channel self.out_channel = out_channel self.prep_channel = 128 self.fc_dim = 512 self.in_dim = in_dim
# prep layer
self.prep1 = nn.Conv1d(
self.in_channel, self.prep_channel, kernel_size=1, bias=False
)
self.bn1 = nn.BatchNorm1d(self.prep_channel)
# fc layers
self.fc1 = nn.Linear(self.prep_channel * self.in_dim, self.fc_dim)
self.fc2 = nn.Linear(self.fc_dim, self.fc_dim)
self.fc3 = nn.Linear(self.fc_dim, self.out_channel)
self.relu = nn.ReLU(True)
self.dropout = nn.Dropout(0.5)
def forward(self, x):
x = self.prep1(x)
x = self.bn1(x)
x = self.fc1(x.view(x.size(0), -1))
x = self.relu(x)
x = self.dropout(x)
x = self.fc2(x)
x = self.relu(x)
x = self.dropout(x)
x = self.fc3(x)
return x
The modified normalization part (KanBlock) which caused a decrease in accuracy:
class KanBlock(nn.Module): def init(self, in_channel, out_channel, in_dim): super(KanBlock, self).init() self.in_channel = in_channel self.out_channel = out_channel self.prep_channel = 128 self.fc_dim = 512 self.in_dim = in_dim
# prep layer
self.prep1 = nn.Conv1d(
self.in_channel, self.prep_channel, kernel_size=1, bias=False
)
self.bn1 = nn.BatchNorm1d(self.prep_channel)
# kan layers
self.kan1 = KAN([self.prep_channel * self.in_dim, 16, self.fc_dim])
self.kan2 = KAN([self.fc_dim, 16, self.fc_dim])
self.kan3 = KAN([self.fc_dim, 16, self.out_channel])
self.relu = nn.ReLU(True)
self.dropout = nn.Dropout(0.5)
def forward(self, x):
x = self.prep1(x)
x = self.bn1(x)
x = self.kan1(x.view(x.size(0), -1))
x = self.relu(x)
x = self.dropout(x)
x = self.kan2(x)
x = self.relu(x)
x = self.dropout(x)
x = self.kan3(x)
return x
Issue: After replacing the linear layers in the FcBlock with KAN layers in the KanBlock, I observed a decrease in accuracy. I am not sure why this is happening. Have you encountered similar issues? Any suggestions or insights would be greatly appreciated.
Thank you very much for your help.
Thank you very much for your work. I've combined ResNet with Kan, using ResNet for feature extraction and replacing the linear layer with KAN for classification. Along the way, I've noticed some characteristics of the KAN network that I'd like to discuss with everyone.
Below is the code I used for the appeal test.