Open tanveer-sayyed opened 5 months ago
The load_pretrained_model
method heavily depends on the output_dir
's name of your model. What did you name it? It appeas that load_pretrained_model
method recognized your model as TinyLLaVA-3.1B instead of TinyLLaVA-1.5B.
Also, set conv_mode
to v1
when training the TinyLLaVA-1.5B.
OUTPUT_DIR=/home/xxx/TinyLLaVABench/checkpoints/tiny-llava-base-TinyLLaVA-1.5B-finetune--phi-lora-al-new
Also, I assumed conv_mode
should be used only during inference. Okay, will re-train by setting it to v1
and post the results here.
Lastly, just for info, my packages:
tokenizers==0.15.1
torch==2.0.1
transformers==4.37.2
after adding conv_mode
I guess it's due to phi
that 3.1B is getting loaded, as per this line.
The 1.5B model used TinyLLaMA as its backbone. Why did you include phi
in your model name?
Yes, my bad. Honestly, it was an ignorance from my end.
So I re-trained using this script:
OUTPUT_DIR=/home/xxx/TinyLLaVABench/checkpoints/tiny-llava-base-TinyLLaVA-1.5B-v1-finetune-lora-al-0419
deepspeed tinyllava/train/train.py \
--deepspeed ./scripts/tiny_llava/zero3.json \
--lora_enable True --lora_r 32 --lora_alpha 64 \
--model_name_or_path bczhou/TinyLLaVA-1.5B \
--version v1 \
--data_path $DATA_PATH \
--image_folder $IMAGE_PATH\
--vision_tower bczhou/TinyLLaVA-1.5B-SigLIP \
--mm_projector_type mlp2x_gelu \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--image_aspect_ratio pad \
--group_by_modality_length False \
--fp16 True \
--output_dir $OUTPUT_DIR \
--num_train_epochs 1 \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 2 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 50000 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 False \
--model_max_length 3072 \
--gradient_checkpointing True \
--dataloader_num_workers 15 \
--lazy_preprocess True \
--report_to wandb
And then merged using:
python scripts/merge_lora_weights.py \
--model-path /home/xxx/TinyLLaVABench/checkpoints/tiny-llava-base-TinyLLaVA-1.5B-v1-finetune-lora-al-0419 \
--model-base bczhou/TinyLLaVA-1.5B \
--save-model-path /home/xxx/TinyLLaVABench/checkpoints/tiny-llava-base-TinyLLaVA-1.5B-v1-finetune-lora-al-0419-merged
But while running the eval(run_tiny_llava.py) I encountered a series of errors...
... all of which were resolved by copy-pasting files from the finetuned model to the merged model. Is this approach incorrect?
As per the instructions, we were able to merge the base model and finetuned model. But on running eval we get this error:
But we do not encounter the above error when we directly run the unmerged model. Why? Is this the right way?
training script: deepspeed tinyllava/train/train.py \ --deepspeed ./scripts/tiny_llava/zero3.json \ --lora_enable True --lora_r 32 --lora_alpha 64 \ --model_name_or_path bczhou/TinyLLaVA-1.5B \ --version phi \ --data_path $DATA_PATH \ --image_folder $IMAGE_PATH\ --vision_tower bczhou/TinyLLaVA-1.5B-SigLIP \ --mm_projector_type mlp2x_gelu \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --image_aspect_ratio pad \ --group_by_modality_length False \ --fp16 True \ --output_dir $OUTPUT_DIR \ --num_train_epochs 3 \ --per_device_train_batch_size 8 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 2 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 50000 \ --save_total_limit 1 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 False \ --model_max_length 3072 \ --gradient_checkpointing True \ --dataloader_num_workers 15 \ --lazy_preprocess True \ --report_to wandb \