Victorwz / LLaVA-Llama-3

Reproduction of LLaVA-v1.5 based on Llama-3-8b LLM backbone.
https://huggingface.co/weizhiwang/LLaVA-Llama-3-8B
Apache License 2.0
42 stars 5 forks source link

Merge Issue ? My config.json has "Architecture" as "LlamaForCausalLM" instead of "LlavaLlamaForCausalLM" in the final merged and adapter models created #6

Open SrikanthChellappa opened 3 weeks ago

SrikanthChellappa commented 3 weeks ago

The pre-train "weizhiwang/llava-v1.5-llama-3-8b-pretrain-clip-large-336px" architecture seems to be "LlamaForCausalLM" and the finetuned model "weizhiwang/LLaVA-Llama-3-8B" has architecture as "LlavaLlamaForCausalLM" in config.json. When i tried to fine-tune with the above pre-train model as given in this repository i am getting LORA adapters and the merged model architecture as "LlamaForCausalLM" and not as "LlavaLlamaForCausalLM". What mistake i am doing here during finetuning ?

I am doing LORA finetuning and deepspeed instruction is as below deepspeed --num_gpus=1 /home/srikanth/api-webapp/LLaVA-Llama-3/llava/train/train_mem.py \ --lora_enable True --lora_r 16 --lora_alpha 32 --mm_projector_lr 2e-5 \ --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct \ --deepspeed /home/srikanth/api-webapp/LLaVA-Llama-3/scripts/zero3.json \ --version v3 \ --data_path /mnt/e/Vision-Finetuning/data/llava_instruct_80k.json \ --image_folder /mnt/e/Vision-Finetuning/data/images/ \ --vision_tower openai/clip-vit-large-patch14-336 \ --pretrain_mm_mlp_adapter /home/srikanth/api-webapp/checkpoints/llava-llama-8B/llava-v1.5-llama-3-8b-pretrain/mm_projector.bin \ --mm_projector_type mlp2x_gelu \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --image_aspect_ratio pad \ --group_by_modality_length True \ --bf16 True \ --output_dir /home/srikanth/api-webapp/checkpoints/llava-llama-8B\ --num_train_epochs 1 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 200 \ --max_steps 100 \ --save_total_limit 1 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \

My merge script is as below python /home/srikanth/api-webapp/LLaVA-Llama-3/scripts/merge_lora_weights.py --model-path /home/srikanth/api-webapp/checkpoints/llava-llama-8B --model-base meta-llama/Meta-Llama-3-8B-Instruct --save-model-path /home/srikanth/api-webapp/multimodal-llava-llama-8B