JieShibo / PETL-ViT

[ICCV 2023 & AAAI 2023] Binary Adapters & FacT, [Tech report] Convpass
MIT License
168 stars 8 forks source link

CIfar100 experimental results query #16

Closed lloo099 closed 1 year ago

lloo099 commented 1 year ago

I'm curious that your cifar100 is less accurate than the results provided in the original article. As you explain in the article you used a vit model and fine-tuned it on the downstream dataset. The adaptformer in your paper gives an accuracy result of 73.8 on the cifar100 dataset, whereas the original article's adaptformer can go up to 83.52 at h=1. Wondering why your training method is so much worse? Thanks

JieShibo commented 1 year ago

The cifar100 task in vtab1k benchmark only uses 1000 training images, while the paper of adaptformer shows results on full cifar100 dataset with 50000 training images. We also provide results on full cifar100 in Table 5 and dataset details in Table 9.

lloo099 commented 1 year ago

Thanks for the prompt reply. This is an interesting topic, also I see that full cifar100 results in Table 5 can be trained up to 93.95. Then I replicated the adaptformer training 100 epochs with batch size equal to 128 before and the results were only around 85.8. What is the reason for this?

lloo099 commented 1 year ago

In Table 9, you mentioned 60,000 training images and 10,000 test images. But from the original, there are 50,000 training images and 10,000 test images. The CIFAR-100 dataset consists of 60,000 images, divided into 100 classes. Each class contains 600 images, split into 500 for training and 100 for testing. https://www.cs.toronto.edu/~kriz/cifar.html

JieShibo commented 1 year ago

Accroding to the original article of adaptformer, the result of fine-tuning with supervised pre-trained ViT-B on full cifar100 can be up to 91.86 (table 6 in https://arxiv.org/pdf/2205.13535.pdf).

'60,000 training images' is a typo. It should be 50,000. Thank you for pointing this out.

lloo099 commented 1 year ago

ic, thanks for your answer!