TinyLLaVA / TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models
Apache License 2.0
569 stars 53 forks source link

Able to merge 1.5B model, but unable to run eval #50

Open tanveer-sayyed opened 5 months ago

tanveer-sayyed commented 5 months ago

As per the instructions, we were able to merge the base model and finetuned model. But on running eval we get this error:


But we do not encounter the above error when we directly run the unmerged model. Why? Is this the right way?

training script: deepspeed tinyllava/train/train.py \ --deepspeed ./scripts/tiny_llava/zero3.json \ --lora_enable True --lora_r 32 --lora_alpha 64 \ --model_name_or_path bczhou/TinyLLaVA-1.5B \ --version phi \ --data_path $DATA_PATH \ --image_folder $IMAGE_PATH\ --vision_tower bczhou/TinyLLaVA-1.5B-SigLIP \ --mm_projector_type mlp2x_gelu \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --image_aspect_ratio pad \ --group_by_modality_length False \ --fp16 True \ --output_dir $OUTPUT_DIR \ --num_train_epochs 3 \ --per_device_train_batch_size 8 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 2 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 50000 \ --save_total_limit 1 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 False \ --model_max_length 3072 \ --gradient_checkpointing True \ --dataloader_num_workers 15 \ --lazy_preprocess True \ --report_to wandb \

baichuanzhou commented 5 months ago

The load_pretrained_model method heavily depends on the output_dir's name of your model. What did you name it? It appeas that load_pretrained_model method recognized your model as TinyLLaVA-3.1B instead of TinyLLaVA-1.5B.

Also, set conv_mode to v1 when training the TinyLLaVA-1.5B.

tanveer-sayyed commented 5 months ago


Also, I assumed conv_mode should be used only during inference. Okay, will re-train by setting it to v1 and post the results here.

Lastly, just for info, my packages:

tanveer-sayyed commented 5 months ago

after adding conv_mode


tanveer-sayyed commented 5 months ago

I guess it's due to phi that 3.1B is getting loaded, as per this line.

baichuanzhou commented 5 months ago

The 1.5B model used TinyLLaMA as its backbone. Why did you include phi in your model name?

tanveer-sayyed commented 5 months ago

Yes, my bad. Honestly, it was an ignorance from my end.

So I re-trained using this script:

deepspeed tinyllava/train/train.py \
    --deepspeed ./scripts/tiny_llava/zero3.json \
    --lora_enable True --lora_r 32 --lora_alpha 64 \
    --model_name_or_path bczhou/TinyLLaVA-1.5B \
    --version v1 \
    --data_path $DATA_PATH \
    --image_folder $IMAGE_PATH\
    --vision_tower bczhou/TinyLLaVA-1.5B-SigLIP \
    --mm_projector_type mlp2x_gelu \
    --mm_vision_select_layer -2 \
    --mm_use_im_start_end False \
    --mm_use_im_patch_token False \
    --image_aspect_ratio pad \
    --group_by_modality_length False \
    --fp16 True \
    --output_dir $OUTPUT_DIR \
    --num_train_epochs 1 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 2 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 50000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 False \
    --model_max_length 3072 \
    --gradient_checkpointing True \
    --dataloader_num_workers 15 \
    --lazy_preprocess True \
    --report_to wandb

And then merged using:

python scripts/merge_lora_weights.py \
--model-path /home/xxx/TinyLLaVABench/checkpoints/tiny-llava-base-TinyLLaVA-1.5B-v1-finetune-lora-al-0419 \
--model-base bczhou/TinyLLaVA-1.5B \
--save-model-path /home/xxx/TinyLLaVABench/checkpoints/tiny-llava-base-TinyLLaVA-1.5B-v1-finetune-lora-al-0419-merged

But while running the eval(run_tiny_llava.py) I encountered a series of errors...

... all of which were resolved by copy-pasting files from the finetuned model to the merged model. Is this approach incorrect?