OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
11.91k stars 840 forks source link

MiniCPM-V-2微调报错 #131

Closed Zmeo closed 3 months ago

Zmeo commented 3 months ago

报错信息:

Traceback (most recent call last): File "/xx/MiniCPM-V/finetune/finetune.py", line 124, in train() File "/xx/MiniCPM-V/finetune/finetune.py", line 119, in train trainer.train() File "/xx/.conda/envs/MiniCPMV/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train return inner_training_loop( File "/xx/.conda/envs/MiniCPMV/lib/python3.10/site-packages/transformers/trainer.py", line 1854, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/xx/.conda/envs/MiniCPMV/lib/python3.10/site-packages/transformers/trainer.py", line 2723, in training_step loss = self.compute_loss(model, inputs) File "/xx/lx/code/MiniCPM-V/finetune/trainer.py", line 19, in compute_loss vllm_embedding, vision_hidden_states = self.model.get_vllm_embedding(inputs) File "/xx/.cache/huggingface/modules/transformers_modules/MiniCPM-V-2/modeling_minicpmv.py", line 88, in get_vllm_embedding vision_hidden_states.append(self.get_vision_embedding(pixel_values)) File "/xx/.cache/huggingface/modules/transformers_modules/MiniCPM-V-2/modeling_minicpmv.py", line 79, in get_vision_embedding res.append(self.resampler(vision_embedding, tgt_size)) File "/xx/.conda/envs/MiniCPMV/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/xx/.conda/envs/MiniCPMV/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/xx/.cache/huggingface/modules/transformers_modules/MiniCPM-V-2/resampler.py", line 158, in forward out = self.attn( File "/xx/.conda/envs/MiniCPMV/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/xx/.conda/envs/MiniCPMV/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/xx/.conda/envs/MiniCPMV/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 1241, in forward attn_output, attn_output_weights = F.multi_head_attention_forward( File "/xx/.conda/envs/MiniCPMV/lib/python3.10/site-packages/torch/nn/functional.py", line 5300, in multi_head_attention_forward q, k, v = _in_projection_packed(query, key, value, in_proj_weight, in_proj_bias) File "/xx/.conda/envs/MiniCPMV/lib/python3.10/site-packages/torch/nn/functional.py", line 4846, in _in_projection_packed return linear(q, w_q, b_q), linear(k, w_k, b_k), linear(v, w_v, b_v) RuntimeError: mat1 and mat2 must have the same dtype, but got Float and BFloat16

finetune脚本:

!/bin/bash

GPUS_PER_NODE=1 NNODES=1 NODE_RANK=0 MASTER_ADDR=localhost MASTER_PORT=6001

MODEL="../openbmb/MiniCPM-V-2" # DATA="./data.json" EVAL_DATA="./data.json"

DISTRIBUTED_ARGS=" --nproc_per_node $GPUS_PER_NODE \ --nnodes $NNODES \ --node_rank $NODE_RANK \ --master_addr $MASTER_ADDR \ --master_port $MASTER_PORT " torchrun $DISTRIBUTED_ARGS finetune.py \ --model_name_or_path $MODEL \ --data_path $DATA \ --eval_data_path $EVAL_DATA \ --remove_unused_columns false \ --label_names "labels" \ --prediction_loss_only false \ --bf16 true \ --bf16_full_eval true \ --do_train \ --do_eval \ --max_steps -1 \ --eval_steps 1 \ --output_dir output/output_minicpmv2 \ --logging_dir output/output_minicpmv2 \ --logging_strategy "steps" \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "steps" \ --save_strategy "steps" \ --save_steps 1 \ --save_total_limit 1 \ --learning_rate 5e-7 \ --weight_decay 0.1 \ --adam_beta2 0.95 \ --warmup_ratio 0.01 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --gradient_checkpointing True \ --report_to "tensorboard" # wandb

数据:

[

{ "id": "0", "image": "/home/xx/xx/xx/xx.jpg", "conversations": [ { "role": "user", "content": "\nHow many desserts are on the white plate?" }, { "role": "assistant", "content": "There are three desserts on the white plate." },
{ "role": "user", "content": "What type of desserts are they?" }, { "role": "assistant", "content": "The desserts are cakes with bananas and pecans on top. They share similarities with donuts, but the presence of bananas and pecans differentiates them." }, { "role": "user", "content": "What is the setting of the image?"}, { "role": "assistant", "content": "The image is set on a table top with a plate containing the three desserts." } ] } ]

qyc-98 commented 3 months ago

Hi,

Thank you for your feedback! To ensure that the script runs correctly, you need to add the following parameter setting to your script:

LLM_TYPE="llama3"

Additionally, make sure to include the --llm_type argument in your torchrun command, like this:

torchrun $DISTRIBUTED_ARGS finetune.py \
    --model_name_or_path $MODEL \
    --llm_type $LLM_TYPE  \
    ... \

Please include the above line in your script, making sure it is set before execution. If you have any other questions, feel free to reach out.

Thanks!

Zmeo commented 3 months ago

感谢您的回复 加上参数之后出现了新的报错

File "/home/kas/.conda/envs/MiniCPMV/lib/python3.10/site-packages/transformers/hf_argparser.py", line 347, in parse_args_into_dataclasses raise ValueError(f"Some specified arguments are not used by the HfArgumentParser: {remaining_args}") ValueError: Some specified arguments are not used by the HfArgumentParser: ['--llm_type', 'llama3']

YuzaChongyi commented 3 months ago

看起来是模型权重类型不一致的问题,你可以尝试加一行 model = model.to(device='cuda', dtype=torch.bfloat16)