Closed julian-fong closed 1 month ago
Hey, from a short investigation of this, I believe these observations might be due to the capacity/ configuration of the adapters rather than an issue in the implementation:
Looking at the parameter count in adapter_summary()
, the LoRA adapter has many more parameters/ capacity than the reft config, so the capacity of the reft config might be too limited to adequatly learn the task. To get better performance, it might help to increase reft capacity, e.g. via r
(reduction factor) or prefix_positions
/ suffix_positions
(e.g. LoReftConfig(r=32, prefix_positions=10)
). Alternatively, using a larger base model (e.g. roberta-large) might help.
As an additional check, you might try switching the task: On tasks from the GLUE benchmark, our Reft implementation did get solid results. See table here: https://github.com/adapter-hub/adapters/pull/705. You might check if you can reproduce those in your setup (data from here).
(side notes: ideally, always use AdapterTrainer
(from adapters import AdapterTrainer
) for training. Also, increasing learning rate to e.g. 1e-4 is usually beneficial.)
thank you for the informative response!
Environment info
adapters
version: latest Below output is from colabtransformers
version: 4.43.4Information
Model I am using (Bert, XLNet ...): roberta (not sure if applicable for any model)
Language I am using the model on (English, Chinese ...): english
Adapter setup I am using (if any):
The problem arises when using:
The tasks I am working on is:
Question Answering on the
boolq
dataset.Binary classification true/false given a question/passage
To reproduce
The training loss does not decrease by much after training via 5 epochs
Training logs:
I tried using the same code to train it with LoRA and the training loss did decrease after 5 epochs
Training Logs
I am just wondering if I am doing something incorrect in my script? Some feedback would be appreciated
Thanks!