Can not reproduce the result reported

ehuynh1106 / TinyImageNet-Transformers

Transformers trained on Tiny ImageNet

Apache License 2.0

47 stars 11 forks source link

Can not reproduce the result reported #3

Closed ShihaoShao-GH closed 2 years ago

ShihaoShao-GH commented 2 years ago

Hi, thanks for your amazing work!

I want to reproduce the result on swin-L you reported (i.e., 91.35%). However, whatever I used the command python main.py --train --model swin or using your pretrained weight provided directly, I cannot get the desired result. The training approach got 90.4% and your pretrained weight got 90.5%. Can you double-check the reproduction?

ehuynh1106 commented 2 years ago

Hi Louie Shao,

For training, did you train for the full 30 epochs?

And for the pretrainined swin-L model you would want to use this command to evaluate it on the validation set: python main.py --evaluate https://github.com/ehuynh1106/TinyImageNet-Transformers/releases/download/weights/swin_large_384.pth --model swin

or if you downloaded the model you can specify the path to the model instead of a URL.

ShihaoShao-GH commented 2 years ago

Hi Louie Shao,

For training, did you train for the full 30 epochs?

Yes, I trained for 30 epochs, remaining all the settings the same.

And for the pretrainined swin-L model you would want to use this command to evaluate it on the validation set: python main.py --evaluate https://github.com/ehuynh1106/TinyImageNet-Transformers/releases/download/weights/swin_large_384.pth --model swin

or if you downloaded the model you can specify the path to the model instead of a URL.

I checked the download link, nothing wrong here.

But I only got, 2022-09-01 04:41:15,320 [INFO] Top 1 Validation Accuracy: 90.5 Top 5 Validation Accuracy: 97.86

ehuynh1106 commented 2 years ago

I checked the download link, nothing wrong here.

But I only got, 2022-09-01 04:41:15,320 [INFO] Top 1 Validation Accuracy: 90.5 Top 5 Validation Accuracy: 97.86

For evaluating the pre-trained weights I provide, I cloned a new copy of the repo and evaluated the model and got the expected result. Is your batch size 32? When it comes to evaluating the only thing that can differ (besides the weights) is batch size. Batch size shouldn't affect the model because Swin uses LayerNorm, not BatchNorm, but maybe there is some unintended behavior there.

Yes, I trained for 30 epochs, remaining all the settings the same.

For training, the best model is not after the 30th epoch. It is around the 25th-27th epoch that the model reaches the 91.35% accuracy. That being said, the model should be able to get at least 91.0% accuracy.

I will re-run the training to check the results it will take me 1-2 days. The only thing I can think of in the meantime is to double check your training settings:

ehuynh1106 commented 2 years ago

Hi Louie,

An update is that I can't reproduce the results.

My intuition says that something changed in the dependencies of this repo. When I first trained the models, I did so on timm version 0.6.1 and now that version is no longer available.

With the current versions specified in requirements.txt I only achieved 91.21%. I'm trying different versions of timm, so far the best result I've gotten is 91.31% (timm==0.5.4).

ehuynh1106 commented 2 years ago

Final update: I can't reproduce the results either. I tried cloning the repo's first commit just in case I accidentally changed some of the training procedure but that didn't work either. All my attempts reach ~91.2% accuracy. Maybe it was an especially lucky seed, I'm not sure what is causing the difference. Sorry I was unable to resolve the issue.

That being said, the uploaded weights do achieve a 91.35% accuracy.