Closed WilliamsToTo closed 2 days ago
I'm not one of the authors, but I encountered the same issue. From what I understand, yes you have to correct the mapping of the names from huggingface to OLMo names.
To do it correctly you can use the conversion script to huggingface weights - https://github.com/allenai/OLMo/blob/26392798cbc4d9ac3898bd2949e77042220bf3f8/scripts/convert_olmo_to_hf_new.py#L105
This looks right @itay1itzhak (haven't exactly done it myself). Will be better supported once we merge #151
Hi, with the new HF olmo integration and the merging of #151, this should now work better out of the box! I was able to successfully run training with the following script:
export CUDA_VISIBLE_DEVICES=0
MODEL_SIZE=7B
NUM_GPUS=1
BATCH_SIZE_PER_GPU=1
TOTAL_BATCH_SIZE=128
GRADIENT_ACC_STEPS=$(($TOTAL_BATCH_SIZE/$NUM_GPUS/$BATCH_SIZE_PER_GPU))
# You can also set --gradient_checkpointing or use `stage3_offloading_accelerate.conf` to save memory,
# but it will trade off speed.
accelerate launch \
--mixed_precision bf16 \
--num_machines 1 \
--num_processes $NUM_GPUS \
--use_deepspeed \
--deepspeed_config_file ds_configs/stage3_no_offloading_accelerate.conf \
open_instruct/finetune.py \
--model_name_or_path allenai/OLMo-1.7-7B-hf \
--use_flash_attn \
--tokenizer_name allenai/OLMo-1.7-7B-hf \
--use_lora \
--add_bos \
--dataset_name allenai/tulu-v2-sft-mixture \
--max_seq_length 2048 \
--preprocessing_num_workers 128 \
--per_device_train_batch_size $BATCH_SIZE_PER_GPU \
--gradient_accumulation_steps $GRADIENT_ACC_STEPS \
--learning_rate 2e-5 \
--lr_scheduler_type linear \
--warmup_ratio 0.03 \
--weight_decay 0. \
--num_train_epochs 2 \
--output_dir tmp \
--with_tracking \
--report_to tensorboard \
--logging_steps 1
I tried to use lora finetune olmo-7b-instruct by using finetune_lora_with_accelerate.sh It reports the below information.
It seems that the Olmo model doesn't have the defalut projection layers {'q_proj', 'k_proj', 'v_proj', 'gate_proj', 'down_proj', 'up_proj', 'o_proj'}. Should I replace them with {"att_proj", "ff_proj"} or have I made an error somewhere?