lora for olmo-7b-instruct

WilliamsToTo commented 4 weeks ago

I tried to use lora finetune olmo-7b-instruct by using finetune_lora_with_accelerate.sh It reports the below information.

Traceback (most recent call last): File "/home/taof/open-instruct/open_instruct/finetune.py", line 906, in File "/home/taof/open-instruct/open_instruct/finetune.py", line 631, in main File "/home/taof/open-instruct/open_instruct/finetune.py", line 631, in main ) File "/home/taof/open-instruct/env/lib/python3.10/site-packages/peft/mapping.py", line 149, in get_peft_model ) File "/home/taof/open-instruct/env/lib/python3.10/site-packages/peft/mapping.py", line 149, in get_peft_model return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)

File "/home/taof/open-instruct/env/lib/python3.10/site-packages/peft/peft_model.py", line 1395, in init File "/home/taof/open-instruct/env/lib/python3.10/site-packages/peft/peft_model.py", line 1395, in init super().init(model, peft_config, adapter_name) File "/home/taof/open-instruct/env/lib/python3.10/site-packages/peft/peft_model.py", line 138, in init super().init(model, peft_config, adapter_name) File "/home/taof/open-instruct/env/lib/python3.10/site-packages/peft/peft_model.py", line 138, in init self.base_model = cls(model, {adapter_name: peft_config}, adapter_name) File "/home/taof/open-instruct/env/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 139, in init self.base_model = cls(model, {adapter_name: peft_config}, adapter_name) File "/home/taof/open-instruct/env/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 139, in init super().init(model, config, adapter_name) super().init(model, config, adapter_name) File "/home/taof/open-instruct/env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 166, in init

File "/home/taof/open-instruct/env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 166, in init self.inject_adapter(self.model, adapter_name)self.inject_adapter(self.model, adapter_name)

File "/home/taof/open-instruct/env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 375, in inject_adapter File "/home/taof/open-instruct/env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 375, in inject_adapter raise ValueError( ValueError: Target modules {'gate_proj', 'v_proj', 'o_proj', 'q_proj', 'down_proj', 'k_proj', 'up_proj'} not found in the base model. Please check the target modules and try again. raise ValueError( ValueError: Target modules {'q_proj', 'k_proj', 'v_proj', 'gate_proj', 'down_proj', 'up_proj', 'o_proj'} not found in the base model. Please check the target modules and try again.

It seems that the Olmo model doesn't have the defalut projection layers {'q_proj', 'k_proj', 'v_proj', 'gate_proj', 'down_proj', 'up_proj', 'o_proj'}. Should I replace them with {"att_proj", "ff_proj"} or have I made an error somewhere?

itay1itzhak commented 4 weeks ago

I'm not one of the authors, but I encountered the same issue. From what I understand, yes you have to correct the mapping of the names from huggingface to OLMo names.

To do it correctly you can use the conversion script to huggingface weights - https://github.com/allenai/OLMo/blob/26392798cbc4d9ac3898bd2949e77042220bf3f8/scripts/convert_olmo_to_hf_new.py#L105

natolambert commented 4 weeks ago

This looks right @itay1itzhak (haven't exactly done it myself). Will be better supported once we merge #151

hamishivi commented 2 days ago

Hi, with the new HF olmo integration and the merging of #151, this should now work better out of the box! I was able to successfully run training with the following script:

export CUDA_VISIBLE_DEVICES=0

MODEL_SIZE=7B
NUM_GPUS=1
BATCH_SIZE_PER_GPU=1
TOTAL_BATCH_SIZE=128
GRADIENT_ACC_STEPS=$(($TOTAL_BATCH_SIZE/$NUM_GPUS/$BATCH_SIZE_PER_GPU))

# You can also set --gradient_checkpointing or use `stage3_offloading_accelerate.conf` to save memory, 
# but it will trade off speed.
accelerate launch \
    --mixed_precision bf16 \
    --num_machines 1 \
    --num_processes $NUM_GPUS \
    --use_deepspeed \
    --deepspeed_config_file ds_configs/stage3_no_offloading_accelerate.conf \
    open_instruct/finetune.py \
    --model_name_or_path allenai/OLMo-1.7-7B-hf \
    --use_flash_attn \
    --tokenizer_name allenai/OLMo-1.7-7B-hf \
    --use_lora \
    --add_bos \
    --dataset_name allenai/tulu-v2-sft-mixture \
    --max_seq_length 2048 \
    --preprocessing_num_workers 128 \
    --per_device_train_batch_size $BATCH_SIZE_PER_GPU \
    --gradient_accumulation_steps $GRADIENT_ACC_STEPS \
    --learning_rate 2e-5 \
    --lr_scheduler_type linear \
    --warmup_ratio 0.03 \
    --weight_decay 0. \
    --num_train_epochs 2 \
    --output_dir tmp \
    --with_tracking \
    --report_to tensorboard \
    --logging_steps 1

allenai / open-instruct

lora for olmo-7b-instruct #174