Closed macabdul9 closed 1 year ago
Hey @macabdul9, while the number of trainable parameters is much lower when training adapters compared to fine-tuning the full model, the training samples still have to be passed throught the full model on each training run. Training time is faster, however, because we don't need to compute gradients for all parameters during the backward pass, yielding mentioned performance increases. You can find a lot more analysis on training/ inference time and efficiency in this paper: https://aclanthology.org/2021.emnlp-main.626.pdf
This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.
This issue was closed because it was stale for 14 days without any activity.
Environment info
adapter-transformers
version: 3.0.1Details
Finetuning bert model on imdb dataset takes ~20mins/epoch while bert finetuning with adapter takes ~12mins/epoch. The first case has 109M trainable parameters and the bert+adapter has less than 2M trainable parameters.
None