sorry，I can't use the structure shown in the paper to get the same result on cifar10 dataset

Ariande1 / MS-ResNet

Advancing Spiking Neural Networks towards Deep Residual Learning

37 stars 5 forks source link

sorry，I can't use the structure shown in the paper to get the same result on cifar10 dataset #3

Closed hugebuck closed 2 months ago

Ariande1 commented 2 months ago

Could you describe your issue in more detail? Is there a specific structure that hasn’t achieved the accuracy reported in the paper?

hugebuck commented 2 months ago

Sure, I choose the structure shown in tabel Ⅱ ，I think they are lightweight model from standard Resnet. But you don't put them on Github's code repository so I reproduced them by reducing the number of channels in your standard ResNet model. And I wanna confirm one thing about the experiment of CIFAR10 datas.。In tabel Ⅷ，you just use randomcrop and normalize for tabel Ⅱ model. But my experimental results differ from yours by approximately 5 percentage points. Thank you for answering my questions.

Ariande1 commented 2 months ago

Oh, I noticed that the table you referenced regarding hyperparameters should come from the version we uploaded to arXiv. Strictly following the hyperparameters in that version will indeed result in an accuracy of 85-86% for SNN-ResNet20 in my reproduction as well.

The transformations used for Table II are RandomCrop, RandomHorizontalFlip, and normalization, which has been modified in our TNNLS version. Additionally, I recommend setting the weight decay to 1e-4. This should lead to more satisfactory results. Apologies for any inconvenience caused.

hugebuck commented 2 months ago

Thank you for the provided explanation. However, I am still unable to achieve 85% accuracy using the previous hyperparameters. Could you please provide the model parameters for ResNet20 that you used, or share the relevant code?

Ariande1 commented 2 months ago

Sure. This is the relevant training code and pre-trained weights of Resnet20, with an accuracy of 88.38%. link

hugebuck commented 2 months ago

Great！Thank you very much for your assistance. My experiment indeed achieved considerable accuracy. I have one more question regarding the Batch Normalization layer settings. What is the reason for initializing BatchNorm3d2 to 0.2*thresh?

Ariande1 commented 2 months ago

Initializing this affine parameter close to zero will give the model a clean and branch-less starting point (only shortcut) and thus provide a faster convergence. You may refer to the SectionV.B of our paper for more details.