Closed SrikanthChellappa closed 1 week ago
Hello, why did you merge the model trained with minicpmv2.5 with llama3? This operation should be impossible to perform
Hi @LDLINGLINGLING We have a Llama-3-8b based BioMedical LLM trained on huge volume of medical artifacts and we want to use that as the LLM (text) backbone for MiniCPM-LlamaV2.5 based lora fine-tuned adapters. The sample code given is a simple merge we tried with base Llama-3-8B model. We need to merge our text llm with this minicpm based vision adapters to meet our needs
There is a problem here. In our model, llama3-8b has been fine-tuned and aligned. If you use llama3 directly, the image module and text model will not be able to adapt.
How do I merge the best of the 2 biomed models (text and image) now ? The existing text llm in minicpm-llama-v2.5 is not medical expert as we see and we cannot use this as text llm with image adapters. Is there a way out to make this work even if this is not a direct merge?
provide a simple method for reference only. You can consider mixed-modal fine-tuning of minicpmv2.5, which includes image-text pair data and plain text data.The operation tutorial for mixed mode fine-tuning is as follows:https://modelbest.feishu.cn/wiki/As5Ow99z3i4hrCkooRIcz79Zn2f?from=from_copylink
But we already have a text fine-tuned BioMedical LLM and we cannot do another text fine-tuning with minicpmv2.5 as you can understand the pain of enormous fine-tuning effort/cost already spent on text llm. is there an option to merge our text adapaters atleast with this minicpmv2.5 ? Pls asisst with alternatives that doesn't involve another fine-tuning from the scratch
I personally think this is very difficult to do。Because at this time your llama3 and the llama3 in our minicpmv are already two models with the same shape but completely different parameters.
Are you saying (or confirming?) that this cannot be merged through any means?
I personally think it is more difficult. Maybe you can merge lora into your weight, but I personally think this effect will not be ideal. But if your professional model is trained with lora, I think there may be a turnaround.
Can you suggest me some ways to overcome this as we are stuck here in this state for the past 2 weeks since we had our minicpmv2.5 lora image adapters
The following are just personal suggestions:
Yes our original text model was trained with lora and we have the adapters. I tried merging the text adapter with our MiniCPMv2.5 based model and it got merged. we are evaluating its impact. will this impact our image finetuning that was done earlier as well?
This will definitely affect image fine-tuning, so you should merge your own lora and then retrain part of the image-text pair data. It is recommended not to update the image module during this process.
can i remerge the minicpmv2.5 image lora adapters again now (post text lora adapter merge) to avoid image fine-tuning again?
I think this should be less effective
You are right. We see the text responses are getting disturbed after we merged the MiniCPM-v2.5 image adapters to it. We will re-train images again with image-text pair data on top of the text adapter merged Minicpm-v2.5 model.
But can you elaborate what you mean by "It is recommended not to update the image module during this process." Are you referring to the text or image here ?
Are you referring to the below parameters with torchrun ? --tune_vision true \ --tune_llm false \
Pls confirm
if you use the lora: --use_lora true --tune_vision false --tune_llm false if you not use the lora: --tune_vision false --tune_llm true
I will be using lora to train image-text pair with torchrun.
Are you saying to keep
--use_lora true
--tune_vision false
--tune_llm false
tune-vision as false - I remember keeping this as true when we earlier did image training with image-text pair for minicpm-v2.5. did we do it wrong last time for lora fine-tuning ? Pls confirm if we need to keep this as false to avoid any rework as you can understand
This is just a personal opinion
Sure. I will keep tune_vision as false when we run the fine-tuning again for images. But can you pls help me to understand the impact of this parameter with training image-text pair
I think that when you merge the lora of the professional model into minicpmv, the distribution of the language model is changed, but the parameters of the image model are not changed. Therefore, if you need to quickly regain the ability to match images and texts, the fastest way is to only train the text module. , but the data requires image and text pairs
Thanks for your support
Can you pls kindly check once as i will be running the below script for image fine-tuning
torchrun --nproc_per_node=2 --nnodes=1 --node_rank=0 --master_addr=localhost --master_port=6001 /root/user/MiniCPM-V/finetune/finetune.py \
--model_name_or_path <our text adapter merged MiniCPM-v2.5 model>\
--llm_type "llama3" \
--data_path <our data source json file> \
--remove_unused_columns false \
--label_names "labels" \
--prediction_loss_only false \
--bf16 false \
--bf16_full_eval false \
--fp16 true \
--fp16_full_eval false \
--do_train \
--do_eval false\
--tune_vision false \
--tune_llm false \
--use_lora true \
--lora_r 16 \
--lora_alpha 32 \
--lora_target_modules "llm\..*layers\.\d+\.self_attn\.(q_proj|k_proj|v_proj|o_proj)" \
--model_max_length 2048 \
--max_slice_nums 9 \
--num_train_epochs 1 \
--output_dir "/root/user/minicpm-adapter-1epoch-75k-images" \
--logging_dir "/root/user/minicpm-adapter-1epoch-75k-images" \
--logging_strategy "steps" \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 4 \
--save_strategy "steps" \
--save_steps 4000 \
--save_total_limit 5 \
--learning_rate 1e-6 \
--weight_decay 0.1 \
--adam_beta2 0.95 \
--warmup_ratio 0.01 \
--lr_scheduler_type "cosine" \
--logging_steps 4000 \
--gradient_checkpointing true \
--deepspeed /root/user/MiniCPM-V/finetune/ds_config_zero2.json \
--report_to none # wandb
我感觉没什么问题
Thanks :)
ERROR: ValueError: Target modules llm..*layers.\d+.self_attn.(q_proj|k_proj|v_proj|o_proj) not found in the base model. Please check the target modules and try again. thrown at the line PeftModel.from_pretrained(
Below complete code
path_to_adapter=r"E:\minicpm-adapter-1epoch-40k-images"
model = AutoModel.from_pretrained('meta-llama/Meta-Llama-3-8B', trust_remote_code=True, device_map="auto", torch_dtype=torch.float16)
lora_model = PeftModel.from_pretrained( model, path_to_adapter, device_map="auto", trust_remote_code=True, torch_dtype=torch.float16 ).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B", trust_remote_code=True) new_model=r"E:\xyz\multimodal-Llama-3-8B-V3" lora_model = lora_model.merge_and_unload()
model.save_pretrained(new_model)
lora_model.save_pretrained(new_model)