JieShibo / PETL-ViT

[ICCV 2023] Binary Adapters, [AAAI 2023] FacT, [Tech report] Convpass
MIT License
162 stars 7 forks source link

The difference between the experimental results on VTAB-1K in your two papers. #14

Closed shachoi closed 10 months ago

shachoi commented 10 months ago

First of all, thank you for your great contributions to PETL of ViT. All these your work and codes are very helpful.

The experimental results on VTAB-1K differ between your two papers, FacT and Binary adapter. It seems that this discrepancy is related to whether image normalization using ImageNet mean and variance is applied or not. If the images are unnormalized (as in the Binary adapter paper), their performance is better than normalized ones (as in the FacT paper). https://github.com/JieShibo/PETL-ViT/blob/026e4f12cfbe1e46bfb4d3ed5ba09f2d9f83e91e/binary_adapter/main.py#L78

Is it correct? I'm asking this question to clarify.

Best,

JieShibo commented 10 months ago

Yes. VPT and SSF use unnormalized images and NOAH uses normalized images. In FacT paper, we reuse the baseline results (including Adapter and LoRA) reported by NOAH, so we also use normalized inputs for FacT. While in the binary adapter paper, we implement the baselines in VPT's setting.

shachoi commented 10 months ago

OK, got it! Thank you for your response.