bic-L / Masked-Spiking-Transformer

[ICCV-23] Masked Spiking Transformer
24 stars 1 forks source link

CIFAR10 training scripts #1

Closed godatta closed 11 months ago

godatta commented 11 months ago

Hi,

Thanks for the great work! Can you please share the training scripts for CIFAR10/100?

I tried to train it with your code and the hyperparameters mentioned in the supplementary materials of the paper, however, I am only getting ~93% accuracy while your paper mentions >98%.

bic-L commented 11 months ago

Thank you for your interest in our work and for taking the time to implement it!

For Cifar10/Cifar100, we use the pretrained ImageNet weights provided in Swin-tiny-1N1K as initialization. We obtain the corresponding ANN weights by training on CIFAR-10 and CIFAR-100 for 100 additional epochs and then apply them for masked training.

All the checkpoints will be released in a couple of days, and we will update the readme, training scripts accordingly. I'll let you know when it is ready.

godatta commented 11 months ago

Thanks for your kind reply.

I was wondering how can we use pretrained weights from swin-tiny as your architecture has 1d batch norm and that has layer norm? Looking forward to the code release.

From: bic-L @.> Reply-To: bic-L/Masked-Spiking-Transformer @.> Date: Sunday, October 8, 2023 at 5:25 AM To: bic-L/Masked-Spiking-Transformer @.> Cc: Gourav Datta @.>, Author @.***> Subject: Re: [bic-L/Masked-Spiking-Transformer] CIFAR10 training scripts (Issue #1)

Thank you for your interest in our work and for taking the time to implement it!

For Cifar10/Cifar100, we use the pretrained ImageNet weights provided in Swin-tiny-1N1Khttps://urldefense.com/v3/__https:/github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_tiny_patch4_window7_224.pth__;!!LIr3w8kk_Xxm!u124Hfq0bHH_GxetYssbYl43AYE6cTFj-lGr7SgpPkaIpeS2OOhH9dIdYtVy4e5SQeJB6sAIpX5D3-EMfCEyGOM$ as initialization. We obtain the corresponding ANN weights by training on CIFAR-10 and CIFAR-100 for 100 additional epochs and then apply them for masked training.

All the checkpoints will be released in a couple of days, and we will update the readme, training scripts accordingly. I'll let you know when it is ready.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/bic-L/Masked-Spiking-Transformer/issues/1*issuecomment-1752015164__;Iw!!LIr3w8kk_Xxm!u124Hfq0bHH_GxetYssbYl43AYE6cTFj-lGr7SgpPkaIpeS2OOhH9dIdYtVy4e5SQeJB6sAIpX5D3-EMGoQrfhg$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AK7JJU654MCAVUAFNFAMCBLX6KLRZAVCNFSM6AAAAAA5WXZLGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJSGAYTKMJWGQ__;!!LIr3w8kk_Xxm!u124Hfq0bHH_GxetYssbYl43AYE6cTFj-lGr7SgpPkaIpeS2OOhH9dIdYtVy4e5SQeJB6sAIpX5D3-EMe5W0tbM$. You are receiving this because you authored the thread.Message ID: @.***>

bic-L commented 11 months ago

You're right, there are some slight differences between the MST and Swin-Tiny models. In short, we first obtain the pretrained weights for Swin-Tiny on Cifar10/Cifar100. Then we modify the model architecture while skip initilizing the LN parameters to get the Swin-Tiny(BN) weights on Cifar10/100, which is the ANN baseline in our experimental comparison.

All the materials that required for reproducing the results will be uploaded soon. Sorry for the inconvenience. Thx~

bic-L commented 11 months ago

Hi, we have uploaded the checkpoints and made necessary updates to the code. I hope this helps. : )

godatta commented 11 months ago

Hi,

Thanks for uploading the code and checkpoint. I saw when I use your Cifar10 checkpoint, I get close to 98% accuracy.

However, I wanted to train the model for CIFAR10 for my own purpose. Can you please give some guidance on that? When I use the pre-trained swin tiny model and don't initialize the BN parameters, I get ~95.4% accuracy for CIFAR10.

In the current repo, I dont see how can we train on CIFAR10 with the swin tiny initializations? Can you please add that or let me know if I missed it somehow?

bic-L commented 11 months ago

There is a certain accuracy drop when converting original ANN model --> the ANN for SNN conversion.

Feel free to contact me by email for the pretrain issue (yfang870@connect.hkust-gz.edu.cn).