my implementation is not very effective

bic-L / Masked-Spiking-Transformer

[ICCV-23] Masked Spiking Transformer

24 stars 1 forks source link

my implementation is not very effective #4

Closed lxhwyj closed 7 months ago

lxhwyj commented 10 months ago

Dear bic-L First of all, thank you for your work and open source code. I recently tried your open source code to achieve similar results and read the related paper "Masked Spiking Transformer". However, I have encountered some challenges in trying to reproduce your results. I found a gap in the performance of the code compared to the performance described in the paper, and I would like to ask you some relevant questions to better understand and improve the code. I trained ANN on Cifar100 using your open source code, but the accuracy of the ANN model I trained was only 73%. Can you provide some guidance on hyperparameter tuning or other key implementation details? I hope to improve my experiment and get better results in this regard. If you would like to share some advice or resources, I would appreciate it. Thank you again for your work and time. Look forward to your reply, thank you!

bic-L commented 10 months ago

Hi, Thank you for your interest in our work. Could you please provide the training script and the log? Perhaps there is something wrong with the setting or parameters.

You can directly train Cifar100 with our code by changing the dataset parameter, and valid dataset names can be found in data/build.py. We have provided the hyperparameters we used to train the model in our main paper and the supplementary material.

BTW, for cifar10/ cifar100, we use Swin-Tiny pre-trained weights as init, which you can find in the recently closed issue. Hope this helps.

lxhwyj commented 10 months ago

Thank you for your reply. I did not load the pre-training model when training Cifar100, which may be the reason why my accuracy rate is around 73%. Following your advice, I used the Swin-Tiny pre-training weights as init and retrained the ANN model, which may take some time due to equipment.

bic-L commented 10 months ago

Hi, Actually, there won't be such a large performance accuracy gap from the results in our paper even training from scratch. It would be great if you could share the training logs - that way we could find out the reason together.

The batch size can really impact final performance. To clarify, the batch size in the supplement is per-GPU, while we actually trained on 8 3090 GPUs for a total batch size of 512 on Cifar10/100 and imagenet. Hope this helps :).

lxhwyj commented 10 months ago

Hello, log_rank0.txt log_rank0.txt records my run logs, and config.json records my parameters.These two files are records of when I didn't load the pre-trained model. config.json

bic-L commented 10 months ago

Thanks, I think the problem is caused by inconsistent batch sizes, if you use the batch size of 512, an epoch should have 79 minibatches, not 781.

We will update the readme to make it clear.

lxhwyj commented 10 months ago

I currently have only two 4090 graphics cards, and what I'm sending you is the result of using one graphics card. I subsequently tested with two graphics cards at the same time, and my batch size was only 128, with 390 small batches in one epoch. If the problem is caused by inconsistent batch sizes, this problem is relatively easy to solve, and I will try to modify and re-test the effect in the future. Thank you！