adapter-hub / adapters

A Unified Library for Parameter-Efficient and Modular Transfer Learning
https://docs.adapterhub.ml
Apache License 2.0
2.53k stars 339 forks source link

No improvement in training loss using ReFT Methods #739

Open julian-fong opened 2 weeks ago

julian-fong commented 2 weeks ago

Environment info

Information

Model I am using (Bert, XLNet ...): roberta (not sure if applicable for any model)

Language I am using the model on (English, Chinese ...): english

Adapter setup I am using (if any):

The problem arises when using:

The tasks I am working on is:

Question Answering on the boolq dataset.

Binary classification true/false given a question/passage

To reproduce

The training loss does not decrease by much after training via 5 epochs

from datasets import load_dataset, DatasetDict

boolq = DatasetDict()

boolq["train"] = load_dataset("google/boolq", split = "train")
boolq["val"] = load_dataset("google/boolq", split="validation")

model_name_or_path = "roberta-base"

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

def preprocess_function(examples):
    return tokenizer(examples['passage'], examples['question'], truncation=True, padding='max_length')

tokenized_datasets = boolq.map(preprocess_function, batched=True)

tokenized_datasets = tokenized_datasets.remove_columns(["question","passage"])
tokenized_datasets = tokenized_datasets.rename_column("answer","label")

from transformers import default_data_collator

data_collator = default_data_collator

from adapters import AutoAdapterModel, LoReftConfig
model = AutoAdapterModel.from_pretrained(model_name_or_path)

config = LoReftConfig()
model.add_adapter("loreft_adapter", config=config)
model.add_classification_head("loreft_adapter", num_labels=2)
model.train_adapter("loreft_adapter")

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir='./results',
    eval_strategy='epoch',
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['val'],
    tokenizer=tokenizer,
)

trainer.train()

Training logs: image

I tried using the same code to train it with LoRA and the training loss did decrease after 5 epochs

from datasets import load_dataset, DatasetDict

boolq = DatasetDict()

boolq["train"] = load_dataset("google/boolq", split = "train")
boolq["val"] = load_dataset("google/boolq", split="validation")

model_name_or_path = "roberta-base"

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

def preprocess_function(examples):
    return tokenizer(examples['passage'], examples['question'], truncation=True, padding='max_length')

tokenized_datasets = boolq.map(preprocess_function, batched=True)

tokenized_datasets = tokenized_datasets.remove_columns(["question","passage"])
tokenized_datasets = tokenized_datasets.rename_column("answer","label")

from transformers import default_data_collator

data_collator = default_data_collator

from adapters import AutoAdapterModel, LoRAConfig
model = AutoAdapterModel.from_pretrained(model_name_or_path)

config = LoRAConfig(
    selfattn_lora=True, intermediate_lora=True, output_lora=True,
    attn_matrices=["q", "k", "v"],
    alpha=16, r=64, dropout=0.1
)

model.add_adapter("assistant_adapter", config=config)
model.add_classification_head("assistant_adapter", num_labels=2)
model.train_adapter("assistant_adapter")

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir='./results',
    eval_strategy='epoch',
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['val'],
    tokenizer=tokenizer,
)

trainer.train()

Training Logs

image

I am just wondering if I am doing something incorrect in my script? Some feedback would be appreciated

Thanks!

calpt commented 1 week ago

Hey, from a short investigation of this, I believe these observations might be due to the capacity/ configuration of the adapters rather than an issue in the implementation:

Looking at the parameter count in adapter_summary(), the LoRA adapter has many more parameters/ capacity than the reft config, so the capacity of the reft config might be too limited to adequatly learn the task. To get better performance, it might help to increase reft capacity, e.g. via r (reduction factor) or prefix_positions/ suffix_positions (e.g. LoReftConfig(r=32, prefix_positions=10)). Alternatively, using a larger base model (e.g. roberta-large) might help.

As an additional check, you might try switching the task: On tasks from the GLUE benchmark, our Reft implementation did get solid results. See table here: https://github.com/adapter-hub/adapters/pull/705. You might check if you can reproduce those in your setup (data from here).

(side notes: ideally, always use AdapterTrainer (from adapters import AdapterTrainer) for training. Also, increasing learning rate to e.g. 1e-4 is usually beneficial.)

julian-fong commented 1 week ago

thank you for the informative response!