AdapterBias

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

arXiv link: https://arxiv.org/abs/2205.00305

To be published in Findings of NAACL 2022

Authors: Chin-Lun Fu*, Zih-Ching Chen*, Yun-Ru Lee, Hung-yi Lee

Overview

AdapterBias

In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed. AdapterBias adds a token-dependent shift to the hidden output of transformer layers to adapt to downstream tasks with only a vector and a linear layer.

Dataset

We use GLUE Benchmark as our dataset. You can download all datasets from the website.

Training

cd src
python exp.py \
    --adapter True \
    --GLUE_path <ur_GLUE_path> \
    --output_path <output_path> \
    --model <model name> \
    --task <the task u want to run> \
    --epoch 100 \
    --lr 0.0001 \
    --max_len 512 \
    --batch_size 32 \

-s or --seed specifies the random seed
-g or --GLUE_path specifies the path of your GLUE dataset.
-o or --output_path specifies the path of saved model and saved predicted file.
-m or --model specifies the pre-trained language model (PLM) you used in training.
- Some examples: bert-base, bert-large, roberta-base, roberta-large
-t or --task specifies the downstream task.
- Some examples: cola, mnli, qnli, qqp, mrpc, rte, sst, sts
-a or --adapter specifies whether you adding our AdapterBias in PLM
--share_alpha specifies whether you share the same alpha in AdapterBias in all transformer layers

Inference

After you run the training, you can automatically get the prediction file in /result/. Also, the saved model is in /model/.

Running all nine tasks of GLUE benchmark, you can sumbit the prediction files to the website.

Imgur Image

Allen0307 / AdapterBias

readme

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

Overview

Dataset

Training

Inference