OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
11.87k stars 836 forks source link

Merge issue with MiniCPM-Llama-v2.5 lora adapters with base Llama-3-8b text model #550

Closed SrikanthChellappa closed 1 week ago

SrikanthChellappa commented 1 week ago

ERROR: ValueError: Target modules llm..*layers.\d+.self_attn.(q_proj|k_proj|v_proj|o_proj) not found in the base model. Please check the target modules and try again. thrown at the line PeftModel.from_pretrained(

Below complete code


path_to_adapter=r"E:\minicpm-adapter-1epoch-40k-images"

model = AutoModel.from_pretrained('meta-llama/Meta-Llama-3-8B', trust_remote_code=True, device_map="auto", torch_dtype=torch.float16)

lora_model = PeftModel.from_pretrained( model, path_to_adapter, device_map="auto", trust_remote_code=True, torch_dtype=torch.float16 ).eval().cuda()

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B", trust_remote_code=True) new_model=r"E:\xyz\multimodal-Llama-3-8B-V3" lora_model = lora_model.merge_and_unload()

model.save_pretrained(new_model)

lora_model.save_pretrained(new_model)

LDLINGLINGLING commented 1 week ago

Hello, why did you merge the model trained with minicpmv2.5 with llama3? This operation should be impossible to perform

SrikanthChellappa commented 1 week ago

Hi @LDLINGLINGLING We have a Llama-3-8b based BioMedical LLM trained on huge volume of medical artifacts and we want to use that as the LLM (text) backbone for MiniCPM-LlamaV2.5 based lora fine-tuned adapters. The sample code given is a simple merge we tried with base Llama-3-8B model. We need to merge our text llm with this minicpm based vision adapters to meet our needs

LDLINGLINGLING commented 1 week ago

There is a problem here. In our model, llama3-8b has been fine-tuned and aligned. If you use llama3 directly, the image module and text model will not be able to adapt.

SrikanthChellappa commented 1 week ago

How do I merge the best of the 2 biomed models (text and image) now ? The existing text llm in minicpm-llama-v2.5 is not medical expert as we see and we cannot use this as text llm with image adapters. Is there a way out to make this work even if this is not a direct merge?

LDLINGLINGLING commented 1 week ago

provide a simple method for reference only. You can consider mixed-modal fine-tuning of minicpmv2.5, which includes image-text pair data and plain text data.The operation tutorial for mixed mode fine-tuning is as follows:https://modelbest.feishu.cn/wiki/As5Ow99z3i4hrCkooRIcz79Zn2f?from=from_copylink

SrikanthChellappa commented 1 week ago

But we already have a text fine-tuned BioMedical LLM and we cannot do another text fine-tuning with minicpmv2.5 as you can understand the pain of enormous fine-tuning effort/cost already spent on text llm. is there an option to merge our text adapaters atleast with this minicpmv2.5 ? Pls asisst with alternatives that doesn't involve another fine-tuning from the scratch

LDLINGLINGLING commented 1 week ago

I personally think this is very difficult to do。Because at this time your llama3 and the llama3 in our minicpmv are already two models with the same shape but completely different parameters.

SrikanthChellappa commented 1 week ago

Are you saying (or confirming?) that this cannot be merged through any means?

LDLINGLINGLING commented 1 week ago

I personally think it is more difficult. Maybe you can merge lora into your weight, but I personally think this effect will not be ideal. But if your professional model is trained with lora, I think there may be a turnaround.

SrikanthChellappa commented 1 week ago

Can you suggest me some ways to overcome this as we are stuck here in this state for the past 2 weeks since we had our minicpmv2.5 lora image adapters

LDLINGLINGLING commented 1 week ago

The following are just personal suggestions:

  1. If your professional model is trained with lora on llama3, then you can try to merge this lora with llama3 of minicpmv2.5, and then train the image and text pairs after merging.
SrikanthChellappa commented 1 week ago

Yes our original text model was trained with lora and we have the adapters. I tried merging the text adapter with our MiniCPMv2.5 based model and it got merged. we are evaluating its impact. will this impact our image finetuning that was done earlier as well?

LDLINGLINGLING commented 1 week ago

This will definitely affect image fine-tuning, so you should merge your own lora and then retrain part of the image-text pair data. It is recommended not to update the image module during this process.

SrikanthChellappa commented 1 week ago

can i remerge the minicpmv2.5 image lora adapters again now (post text lora adapter merge) to avoid image fine-tuning again?

LDLINGLINGLING commented 1 week ago

I think this should be less effective

SrikanthChellappa commented 1 week ago

You are right. We see the text responses are getting disturbed after we merged the MiniCPM-v2.5 image adapters to it. We will re-train images again with image-text pair data on top of the text adapter merged Minicpm-v2.5 model.

But can you elaborate what you mean by "It is recommended not to update the image module during this process." Are you referring to the text or image here ?

Are you referring to the below parameters with torchrun ? --tune_vision true \ --tune_llm false \

Pls confirm

LDLINGLINGLING commented 1 week ago

if you use the lora: --use_lora true --tune_vision false --tune_llm false if you not use the lora: --tune_vision false --tune_llm true

SrikanthChellappa commented 1 week ago

I will be using lora to train image-text pair with torchrun.
Are you saying to keep --use_lora true --tune_vision false --tune_llm false

tune-vision as false - I remember keeping this as true when we earlier did image training with image-text pair for minicpm-v2.5. did we do it wrong last time for lora fine-tuning ? Pls confirm if we need to keep this as false to avoid any rework as you can understand

LDLINGLINGLING commented 1 week ago

This is just a personal opinion

SrikanthChellappa commented 1 week ago

Sure. I will keep tune_vision as false when we run the fine-tuning again for images. But can you pls help me to understand the impact of this parameter with training image-text pair

LDLINGLINGLING commented 1 week ago

I think that when you merge the lora of the professional model into minicpmv, the distribution of the language model is changed, but the parameters of the image model are not changed. Therefore, if you need to quickly regain the ability to match images and texts, the fastest way is to only train the text module. , but the data requires image and text pairs

SrikanthChellappa commented 1 week ago

Thanks for your support

SrikanthChellappa commented 1 week ago

Can you pls kindly check once as i will be running the below script for image fine-tuning

torchrun --nproc_per_node=2 --nnodes=1 --node_rank=0 --master_addr=localhost --master_port=6001 /root/user/MiniCPM-V/finetune/finetune.py \

--model_name_or_path <our text adapter merged MiniCPM-v2.5 model>\
--llm_type "llama3" \
--data_path <our data source json file> \
--remove_unused_columns false \
--label_names "labels" \
--prediction_loss_only false \
--bf16 false \
--bf16_full_eval false \
--fp16 true \
--fp16_full_eval false \
--do_train \
--do_eval false\
--tune_vision false \
--tune_llm false \
--use_lora true \
--lora_r 16 \
--lora_alpha 32 \
--lora_target_modules "llm\..*layers\.\d+\.self_attn\.(q_proj|k_proj|v_proj|o_proj)" \
--model_max_length 2048 \
--max_slice_nums 9 \
--num_train_epochs 1 \
--output_dir "/root/user/minicpm-adapter-1epoch-75k-images" \
--logging_dir "/root/user/minicpm-adapter-1epoch-75k-images" \
--logging_strategy "steps" \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 4 \
--save_strategy "steps" \
--save_steps 4000 \
--save_total_limit 5 \
--learning_rate 1e-6 \
--weight_decay 0.1 \
--adam_beta2 0.95 \
--warmup_ratio 0.01 \
--lr_scheduler_type "cosine" \
--logging_steps 4000 \
--gradient_checkpointing true \
--deepspeed /root/user/MiniCPM-V/finetune/ds_config_zero2.json \
--report_to none # wandb
LDLINGLINGLING commented 1 week ago

我感觉没什么问题

SrikanthChellappa commented 1 week ago

Thanks :)