SeanLee97 / AnglE

Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard
https://arxiv.org/abs/2309.12871
MIT License
397 stars 30 forks source link

How can I finetune a saved adapt model? #41

Closed linpeisensh closed 4 months ago

linpeisensh commented 5 months ago

Here is my train_lora.py:

from datasets import load_dataset
from angle_emb import AnglE, AngleDataTokenizer

# 2. load dataset
# `text1`, `text2`, and `label` are three required columns.
def get_ds(path):
    ds = xxx
    return ds

# 3. transform data
rt = '../data/dataset/v02/'
data_files = {xxx}
ds = load_dataset(rt)
ds = ds.map(lambda obj: {"text1": str(obj["s1"]), "text2": str(obj['s2']), "label": obj['label']})
ds = ds.select_columns(["text1", "text2", "label"])

# 1. load pretrained model
# model_path = '../UAE-Large-V1' # first finetune based model
model_path = '../sts-b/2/ll10e1/best-checkpoint/' # second finetune based model
angle = AnglE.from_pretrained(model_path, max_length=50, pooling_strategy='cls', apply_lora=True, load_kbit=4, train_mode=True).cuda() 

# 3. transform data
train_ds = ds['train'].shuffle().map(AngleDataTokenizer(angle.tokenizer, angle.max_length), num_proc=8)
valid_ds = ds['validation'].map(AngleDataTokenizer(angle.tokenizer, angle.max_length), num_proc=8)

batch_size = 32
save_steps = len(train_ds) // batch_size
lrb = 10
epoch = 5
output_dir = f'../sts-b/7/ll{lrb}e{epoch}'

print('save_steps:', save_steps, output_dir)
# 4. fit
angle.fit(
    train_ds=train_ds,
    valid_ds=valid_ds,
    output_dir=output_dir,
    batch_size=batch_size,
    epochs=epoch,
    learning_rate=lrb * (10 ** -5),
    save_steps=save_steps,
    eval_steps=1000,
    warmup_steps=0,
    gradient_accumulation_steps=4,
    loss_kwargs={
        'w1': 1.0,
        'w2': 35,
        'w3': 1.0,
        'cosine_tau': 20,
        'ibn_tau': 20,
        'angle_tau': 1.0
    },
    fp16=True,
    logging_steps=100
)

When I run this code to finetune with the first finetune model, this error occurs: INFO:AnglE:lora_config={'task_type': <TaskType.FEATURE_EXTRACTION: 'FEATURE_EXTRACTION'>, 'r': 32, 'lora_alpha': 32, 'lora_dropout': 0.1} INFO:AnglE:lora target modules=['base_layer', 'default'] INFO:peft.tuners.tuners_utils:Already found a peft_config attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing! Traceback (most recent call last): File "/mnt/bd/mlx-bytedrive-1378-622c9164/llm/uae/train_lora.py", line 22, in angle = AnglE.from_pretrained(model_path, max_length=50, pooling_strategy='cls', apply_lora=True, load_kbit=4, train_mode=True).cuda() #
File "/mnt/bd/mlx-bytedrive-1378-622c9164/llm/venv/lib/python3.9/site-packages/angle_emb/angle.py", line 847, in from_pretrained angle = AnglE(model_name_or_path, File "/mnt/bd/mlx-bytedrive-1378-622c9164/llm/venv/lib/python3.9/site-packages/angle_emb/angle.py", line 772, in init model = get_peft_model(model, peft_config) File "/mnt/bd/mlx-bytedrive-1378-622c9164/llm/venv/lib/python3.9/site-packages/peft/mapping.py", line 133, in get_peft_model return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name) File "/mnt/bd/mlx-bytedrive-1378-622c9164/llm/venv/lib/python3.9/site-packages/peft/peft_model.py", line 1835, in init super().init(model, peft_config, adapter_name) File "/mnt/bd/mlx-bytedrive-1378-622c9164/llm/venv/lib/python3.9/site-packages/peft/peft_model.py", line 125, in init self.base_model = cls(model, {adapter_name: peft_config}, adapter_name) File "/mnt/bd/mlx-bytedrive-1378-622c9164/llm/venv/lib/python3.9/site-packages/peft/tuners/lora/model.py", line 111, in init super().init(model, config, adapter_name) File "/mnt/bd/mlx-bytedrive-1378-622c9164/llm/venv/lib/python3.9/site-packages/peft/tuners/tuners_utils.py", line 90, in init self.inject_adapter(self.model, adapter_name) File "/mnt/bd/mlx-bytedrive-1378-622c9164/llm/venv/lib/python3.9/site-packages/peft/tuners/tuners_utils.py", line 247, in inject_adapter self._create_and_replace(peft_config, adapter_name, target, target_name, parent, optional_kwargs) File "/mnt/bd/mlx-bytedrive-1378-622c9164/llm/venv/lib/python3.9/site-packages/peft/tuners/lora/model.py", line 202, in _create_and_replace new_module = self._create_new_module(lora_config, adapter_name, target, kwargs) File "/mnt/bd/mlx-bytedrive-1378-622c9164/llm/venv/lib/python3.9/site-packages/peft/tuners/lora/model.py", line 355, in _create_new_module raise ValueError( ValueError: Target module Dropout(p=0.1, inplace=False) is not supported. Currently, only the following modules are supported: torch.nn.Linear, torch.nn.Embedding, torch.nn.Conv2d, transformers.pytorch_utils.Conv1D.

If I load model with angle = AnglE.from_pretrained(model_path, max_length=50, pooling_strategy='cls', train_mode=True).cuda() Then this error occus: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

What should I do to finetune my finetuned adapter(peft) model again? Thanks!

SeanLee97 commented 5 months ago

@linpeisensh hi, to continue to finetune a model, you can specify pretrained_model_path or pretrained_lora_path:

1) For non-lora: AnglE.from_pretrained(backbone_name_or_path, pretrained_model_path='your_model', ...) 2) For lora: AnglE.from_pretrained(backbone_name_or_path, pretrained_lora_path='your_model', ...)

It is recommended to follow train_cli.py to set up your custom training.