Leiyi-Hu / mona

The official implementation of "Adapter is All You Need for Tuning Visual Tasks".
68 stars 1 forks source link

Mona

The official implementation of "5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks".

Table of Contents

Introduction

Pre-training & fine-tuning can enhance the transferring efficiency and performance in visual tasks. Recent deltatuning methods provide more options for visual classification tasks. Despite their success, existing visual delta-tuning art fails to exceed the upper limit of full fine-tuning on challenging tasks like instance segmentation and semantic segmentation. To find a competitive alternative to full fine-tuning, we propose the Multi-cognitive Visual Adapter (Mona) tuning, a novel adapter-based tuning method.

mona

Mona achieves the strong performance on COCO object detection (53.4 box AP and 46.0 mask AP on test-dev with Swin-Base), and ADE20K semantic segmentation (51.36 mIoU on val with Swin-Large).

Main Results

The proposed Mona outperforms full fine-tuning on representative visual tasks, which promotes the upper limit of previous delta-tuning art. The results demonstrate that the adapter-tuning paradigm can replace full fine-tuning and achieve better performance in most visual tasks. Full fine-tuning may no longer be the only preferred solution for transfer learning in the future.

performance

Note:


Moreover, Mona converges faster than other tested delta-tuning arts.

convergency

Note:

Getting Started

Object Detection & Instance Segmentation

Installation

Please refer to Swin-Transformer-Object-Detection for the environments and dataset preparation.

Training Mona

After organizing the dataset, you have to modify the config file according to your environments.

Please execute the following command in the project path.

COCO

bash Swin-Transformer-Object-Detection/tools/dist_train.sh Swin-Transformer-Object-Detection/mona_configs/swin-b_coco/cascade_mask_swin_base_3x_coco_sample_1_bs_16_mona.py `Your GPUs`

VOC

bash Swin-Transformer-Object-Detection/tools/dist_train.sh Swin-Transformer-Object-Detection/mona_configs/swin-l_voc/voc_retinanet_swin_large_1x_mona.py `Your GPUs`

Semantic Segmentation

Installation

Please refer to Swin-Transformer-Semantic-Segmentation for the environments and dataset preparation.

Training Mona

Follow the guidance in Object Detection & Instance Segmentation to check your config file.

Please execute the following command in the project path.

ADE20K

bash Swin-Transformer-Semantic-Segmentation/tools/dist_train.sh Swin-Transformer-Semantic-Segmentation/mona_configs/swin-l_ade20k/ade20k_upernet_swin_large_160k_mona.py `Your GPUs`

Classification

Installation

Please refer to Swin-Transformer-Classification for environments. the environments.

Note:

Training Mona

Follow the guidance in Object Detection & Instance Segmentation to check your config file.

Please execute the following command in the project path.

Oxford Flower

bash Swin-Transformer-Classification/tools/dist_train.sh Swin-Transformer-Classification/mona_configs/swin-l_oxford-flower/swin-large_4xb8_oxford_flower_mona.py `Your GPUs`

Oxford Pet

bash Swin-Transformer-Classification/tools/dist_train.sh Swin-Transformer-Classification/mona_configs/swin-l_oxford-flower/swin-large_4xb8_oxford_pet_mona.py `Your GPUs`

Oxford VOC

bash Swin-Transformer-Classification/tools/dist_train.sh Swin-Transformer-Classification/mona_configs/swin-l_oxford-flower/swin-large_4xb8_voc_mona.py `Your GPUs`

Citation

If our work is helpful for your research, please cite:


@misc{yin20245100breakingperformanceshackles,
      title={5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks}, 
      author={Dongshuo Yin and Leiyi Hu and Bin Li and Youqun Zhang and Xue Yang},
      year={2024},
      eprint={2408.08345},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.08345}, 
}

Acknowledgement

We are grateful for the following, but not limited to these, wonderful open-source repositories.