Hey, I'm trying to train a adapter for a Seq2Seq task with language adapters. Since most of the language adapters on the hub are pretrained for BERT or RoBERTa I cannot use e.g. BART for the task adapter. I set up a EncoderDecoder Model with bert-base-mulitlingual-cased as base, but even with very few training data the training loss of adapter training stagnates at a high level (~4) and does not predict something meaningful. When fully fine-tuning with the same training settings the training loss quickly decreases around 0. Setups I tried:
[x] Training a task adapter with bart-base - works
[x] Full Fine-tuning an EncoderDecoder model based on bert-base-mulitlingual-cased using the Huggingface Trainer - works
[ ] Training a task adapter with an EncoderDecoder model based on bert-base-mulitlingual-cased - the model repeatedly predicts the same word; training loss stagnates at high level.
When training a adapter using BART, a prediction head is added. With the EncoderDecoder this seems to be missing.The saved adapter does not contain a head_config.json like the BART trained adapter.
What do I need to change to train this task adapter with an EncoderDecoder Model?
This issue has been automatically marked as stale because it has been without activity for 90 days. This issue will be closed in 14 days unless you comment or remove the stale label.
Environment info
adapter-transformers
version: 3.2.1Details
Hey, I'm trying to train a adapter for a Seq2Seq task with language adapters. Since most of the language adapters on the hub are pretrained for BERT or RoBERTa I cannot use e.g. BART for the task adapter. I set up a EncoderDecoder Model with
bert-base-mulitlingual-cased
as base, but even with very few training data the training loss of adapter training stagnates at a high level (~4) and does not predict something meaningful. When fully fine-tuning with the same training settings the training loss quickly decreases around 0. Setups I tried:bart-base
- worksbert-base-mulitlingual-cased
using the Huggingface Trainer - worksbert-base-mulitlingual-cased
- the model repeatedly predicts the same word; training loss stagnates at high level.Base model setup
Adapter setup
I tried to add a task adapter using multiple methods:
or
When training a adapter using BART, a prediction head is added. With the EncoderDecoder this seems to be missing.The saved adapter does not contain a
head_config.json
like the BART trained adapter.What do I need to change to train this task adapter with an EncoderDecoder Model?
Training setup
EncoderDecoder adapter_config.json
Bart adapter_config.json: