adapter-hub / adapters

A Unified Library for Parameter-Efficient and Modular Transfer Learning
https://docs.adapterhub.ml
Apache License 2.0
2.58k stars 346 forks source link

No difference in speedup between configs #424

Closed pugantsov closed 1 year ago

pugantsov commented 2 years ago

Environment info

Details

I don't seem to be achieving much speedup with adapters thus far and I'm unsure what it is that I'm doing wrong. I upgraded to 3.1.0 and I've tried using the IA3Config which trains a fraction of the parameters than the PfeifferConfig. To my surprise, it's still taking me 10 minutes or so per epoch on 25,000 samples which is roughly the same time as the PfeifferConfig.

For my model, I use a custom BERT head with an additional layer and some modifications (just things like mean pooling etc, nothing particularly intensive) and I follow the Colab notebook in setting up the following:

bert = transformers.BertModel.from_pretrained("bert-base-cased", add_pooling_layer=False)

for p in bert.parameters():
    p.requires_grad = False

model = BertClassificationHead(bert=bert, pooling="mean")

config = transformers.adapters.IA3Config()
model.bert.add_adapter("ia3_adapter", config=config)
model.bert.train_adapter("ia3_adapter")
model = model.to(device)

Am I misunderstanding the parameter efficiency aspect w.r.t adapters in general or am I implementing something incorrectly?

calpt commented 2 years ago

The code you provided seems to be correct. In general, the number of parameters trainable per method is not necessarily reflected in the required training time for a method. During training, the model has to propagate the gradients down to the first module that requires gradient updates, even if many modules in between don't require gradients. Larger speedups therefore could be obtained by leaving out adapter modules in the earlier model layers. There's some analysis on that in this paper: https://aclanthology.org/2021.emnlp-main.626.pdf.

adapter-hub-bert commented 1 year ago

This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.

adapter-hub-bert commented 1 year ago

This issue was closed because it was stale for 14 days without any activity.