Closed jingzhengli closed 2 years ago
Thanks for your interests in our work. The designs VPT [1] and AdaptFormer are both insipred by recent advances of efficient parameter tuning in NLP field. VPT focused on promp tuning and directly utilized the Adapter from AdaptFusion [2]. However, we focused on the Adapter structure and evaluate our AdaptFormer on both image and video tasks.
Compared with the vanilla Adapter in VPT [1], our AdaptFormer:
(i) uses the scaling factor s
is to balance the task-agnostic features (generated by the original frozen branch) and the task-specific features (generated by the tunable bottleneck branch). We evaluate AdaptFormer with multiple s values and the results are summarized in Table 2c. We also provided detailed discussion in the main text.
(ii) further compresses middle dimension of the AdaptMLP module. We aim to seek for a trade-off between model capacity (i.e., potential) and adaptation efficiency. In fact, the middle dimension has a main influence on the parameter size of adapter. The higher dimension brings more parameters while the efficiency and storage are limited. As shown in below, we evaluate several numbers of middle dimension and found that using 64 (reduction rate is 12) is optimal to achieve accuracy, light-weight storage, and efficiency.
Middle Dim | #Params SSv2 | Top1 Acc SSv2 | #Params NUS-WIDE | mAP NUS-WIDE |
---|---|---|---|---|
1 | 0.16 | 50.03 | 0.09 | 57.51 |
4 | 0.22 | 54.70 | 0.15 | 58.14 |
16 | 0.44 | 57.62 | 0.37 | 59.00 |
32 | 0.73 | 58.27 | 0.66 | 59.09 |
64 | 1.32 | 59.02 | 1.25 | 59.07 |
128 | 2.51 | 58.95 | 2.43 | 59.49 |
256 | 4.87 | 58.87 | 4.79 | 59.62 |
512 | 9.59 | 58.98 | 9.51 | 59.82 |
We have added these discussion in our camery ready version. Please stay tuned.
[1] Jia, Menglin, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. "Visual prompt tuning." ECCV 2022.
[2] Pfeiffer, Jonas, Kamath, Aishwarya, Rücklé, Andreas, Cho, Kyunghyun, and Gurevych, Iryna. "Adapterfusion: Nondestructive task composition for transfer learning." EACL 2021
Thanks for your interests in our work. The designs VPT [1] and AdaptFormer are both insipred by recent advances of efficient parameter tuning in NLP field. VPT focused on promp tuning and directly utilized the Adapter from AdaptFusion [2]. However, we focused on the Adapter structure and evaluate our AdaptFormer on both image and video tasks.
Compared with the vanilla Adapter in VPT [1], our AdaptFormer:
(i) uses the scaling factor
s
is to balance the task-agnostic features (generated by the original frozen branch) and the task-specific features (generated by the tunable bottleneck branch). We evaluate AdaptFormer with multiple s values and the results are summarized in Table 2c. We also provided detailed discussion in the main text.(ii) further compresses middle dimension of the AdaptMLP module. We aim to seek for a trade-off between model capacity (i.e., potential) and adaptation efficiency. In fact, the middle dimension has a main influence on the parameter size of adapter. The higher dimension brings more parameters while the efficiency and storage are limited. As shown in below, we evaluate several numbers of middle dimension and found that using 64 (reduction rate is 12) is optimal to achieve accuracy, light-weight storage, and efficiency.
Middle Dim #Params SSv2 Top1 Acc SSv2 #Params NUS-WIDE mAP NUS-WIDE 1 0.16 50.03 0.09 57.51 4 0.22 54.70 0.15 58.14 16 0.44 57.62 0.37 59.00 32 0.73 58.27 0.66 59.09 64 1.32 59.02 1.25 59.07 128 2.51 58.95 2.43 59.49 256 4.87 58.87 4.79 59.62 512 9.59 58.98 9.51 59.82 We have added these discussion in our camery ready version. Please stay tuned.
[1] Jia, Menglin, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. "Visual prompt tuning." ECCV 2022.
[2] Pfeiffer, Jonas, Kamath, Aishwarya, Rücklé, Andreas, Cho, Kyunghyun, and Gurevych, Iryna. "Adapterfusion: Nondestructive task composition for transfer learning." EACL 2021
Thanks for your reply.
Thanks for sharing the nice work. In my opinion, the AdaptFormer is the same as the baseline "Adatper" used in VPT [46]. Am I misunderstanding?