Full-tuning and Linear accuracy is strange

JieShibo / PETL-ViT

[ICCV 2023 & AAAI 2023] Binary Adapters & FacT, [Tech report] Convpass

MIT License

168 stars 8 forks source link

Full-tuning and Linear accuracy is strange #8

Closed sungbinson closed 1 year ago

sungbinson commented 1 year ago

I'm now proceeding with a reimplementation of full-tuning and Linear accuracy. However, unlike the paper, both full-tuning and linear have performance below 50%. Can you tell me why?

JieShibo commented 1 year ago

Hi.

The results of full fine-tuning, linear probing, and VPT are borrowed from VPT paper and can be reproduced with this repo.

I guess the main reasons for the performance gap are that

VPT paper uses SGD instead of AdamW for linear, and searches the base learning rate and weight decay for each single task. The optimal hyper-parameters can be found here.
VPT paper does not normalize the input images, while our pipeline follows NOAH paper and uses normalized inputs (see here).

sungbinson commented 1 year ago

Thanks!!! I have a one question about optimal hyper-parameters I know vtab is a dataset for domain adaptation In optimal hyper-parameters.csv, cifar100 batchsize is 2,048(linear), but vtab training set has a 1,000 training sample is it possible? i think that they use full training sample(cifar100 has a 50,000 dataset)

JieShibo commented 1 year ago

Sorry, I have no idea. This file is provided by VPT’s author @KMnP here. Maybe you can ask her or raise an issue in this repo.

KMnP commented 1 year ago

Hi @sungbinson, we set 2048 for all linear probing experiments since all the experiments including other larger datasets. In the case of VTAB-1k, setting 2048 means we use the entire training data (800 during validation, 1000 during training) for one batch. Let me know if you have other questions, or you can raise an issue in VPT repo like @JieShibo suggested.