arXiv link: https://arxiv.org/abs/2205.00305
To be published in Findings of NAACL 2022
Authors: Chin-Lun Fu*, Zih-Ching Chen*, Yun-Ru Lee, Hung-yi Lee
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed. AdapterBias adds a token-dependent shift to the hidden output of transformer layers to adapt to downstream tasks with only a vector and a linear layer.
We use GLUE Benchmark as our dataset. You can download all datasets from the website.
cd src
python exp.py \
--adapter True \
--GLUE_path <ur_GLUE_path> \
--output_path <output_path> \
--model <model name> \
--task <the task u want to run> \
--epoch 100 \
--lr 0.0001 \
--max_len 512 \
--batch_size 32 \
-s
or --seed
specifies the random seed-g
or --GLUE_path
specifies the path of your GLUE dataset.-o
or --output_path
specifies the path of saved model and saved predicted file.-m
or --model
specifies the pre-trained language model (PLM) you used in training.
bert-base
, bert-large
, roberta-base
, roberta-large
-t
or --task
specifies the downstream task.
cola
, mnli
, qnli
, qqp
, mrpc
, rte
, sst
, sts
-a
or --adapter
specifies whether you adding our AdapterBias in PLM--share_alpha
specifies whether you share the same alpha in AdapterBias in all transformer layersAfter you run the training, you can automatically get the prediction file in
Running all nine tasks of GLUE benchmark, you can sumbit the prediction files to the website.