OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
12.49k stars 878 forks source link

[BUG] <title>NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device. #635

Open wshiman opened 3 weeks ago

wshiman commented 3 weeks ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

我在3090单卡上尝试使用qlora来微调Minicpm-v-2.6-int4的模型的时候遇到了NotImplementedError下面是具体的输出情况 [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4 [WARNING] using untested triton version (3.0.0), only 1.0.0 is known to be compatible /root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead. @autocast_custom_fwd /root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead. @autocast_custom_bwd /root/miniconda3/envs/cpmv/lib/python3.11/site-packages/transformers/training_args.py:1545: FutureWarning: evaluation_strategy is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use eval_strategy instead warnings.warn( [2024-10-14 12:34:50,823] [INFO] [comm.py:637:init_distributed] cdb=None [2024-10-14 12:34:50,823] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl You have loaded an AWQ model on CPU and have a CUDA device available, make sure to set your model on a GPU device in order to run your model. low_cpu_mem_usage was None, now set to True since model is quantized. Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 5.18it/s] Some weights of the model checkpoint at /root/autodl-tmp/MiniCPM-V_2_6_awq_int4 were not used when initializing MiniCPMV: ['resampler.attn.out_proj.weight', 'resampler.kv_proj.weight', 'vpm.encoder.layers.0.mlp.fc1.weight', 'vpm.encoder.layers.0.mlp.fc2.weight', 'vpm.encoder.layers.0.self_attn.k_proj.weight', 'vpm.encoder.layers.0.self_attn.out_proj.weight', 'vpm.encoder.layers.0.self_attn.q_proj.weight', 'vpm.encoder.layers.0.self_attn.v_proj.weight', 'vpm.encoder.layers.1.mlp.fc1.weight', 'vpm.encoder.layers.1.mlp.fc2.weight', 'vpm.encoder.layers.1.self_attn.k_proj.weight......

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

这是我的finetune_lora.sh文件,严格按照过微调指南进行修改,其余操作也均按照操作指南执行#cd finetune

bash finetune_lora.sh

!/bin/bash

GPUS_PER_NODE=1 NNODES=1 NODE_RANK=0 MASTER_ADDR=localhost MASTER_PORT=6001

MODEL="/root/autodl-tmp/MiniCPM-V_2_6_awq_int4" # or openbmb/MiniCPM-V-2, openbmb/MiniCPM-Llama3-V-2_5

ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.

See the section for finetuning in README for more information.

DATA="/root/cpmv2_6/result_cpmv2_6/processed_data.json" EVAL_DATA="/root/cpmv2_6/result_cpmv2_6/processed_data.json" LLM_TYPE="minicpm"

if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm

if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE=llama3

export NCCL_P2P_DISABLE=1 export NCCL_IB_DISABLE=1

MODEL_MAX_Length=1000 # if conduct multi-images sft, please set MODEL_MAX_Length=4096

DISTRIBUTED_ARGS=" --nproc_per_node $GPUS_PER_NODE \ --nnodes $NNODES \ --node_rank $NODE_RANK \ --master_addr $MASTER_ADDR \ --master_port $MASTER_PORT " torchrun $DISTRIBUTED_ARGS finetune.py \ --model_name_or_path $MODEL \ --llm_type $LLM_TYPE \ --data_path $DATA \ --eval_data_path $EVAL_DATA \ --remove_unused_columns false \ --label_names "labels" \ --prediction_loss_only false \ --bf16 false \ --bf16_full_eval false \ --fp16 true \ --fp16_full_eval true \ --do_train \ --do_eval \ --tune_vision false \ --tune_llm false \ --use_lora true \ --q_lora true \ --tune_vision false \ --lora_target_modules "llm..*layers.\d+.self_attn.(q_proj|k_proj|v_proj|o_proj)" \ --model_max_length $MODEL_MAX_Length \ --max_slice_nums 9 \ --max_steps 10000 \ --eval_steps 1000 \ --output_dir output/output__lora \ --logging_dir output/output_lora \ --logging_strategy "steps" \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "steps" \ --save_strategy "steps" \ --save_steps 1000 \ --save_total_limit 10 \ --learning_rate 1e-6 \ --weight_decay 0.1 \ --adam_beta2 0.95 \ --warmup_ratio 0.01 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --gradient_checkpointing true \ --deepspeed ds_config_zero2.json \ --report_to "tensorboard" # wandb

运行环境 | Environment

- OS:Ubuntu 22.04
- Python:3.11.9
- Transformers:4.45.2
- Torch:2.4.0
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.1

备注 | Anything else?

No response

LDLINGLINGLING commented 3 weeks ago

你好,qlora只能训练bnb量化的int4模型,您这边训练的的是awq的量化模型。请你使用以下模型替换:https://huggingface.co/openbmb/MiniCPM-V-2_6-int4