MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
12.49k
stars
878
forks
source link
[BUG] <title>NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device. #635
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
[X] 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
我在3090单卡上尝试使用qlora来微调Minicpm-v-2.6-int4的模型的时候遇到了NotImplementedError下面是具体的输出情况
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
[WARNING] using untested triton version (3.0.0), only 1.0.0 is known to be compatible
/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead.
@autocast_custom_fwd
/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead.
@autocast_custom_bwd
/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/transformers/training_args.py:1545: FutureWarning: evaluation_strategy is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use eval_strategy instead
warnings.warn(
[2024-10-14 12:34:50,823] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-10-14 12:34:50,823] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
You have loaded an AWQ model on CPU and have a CUDA device available, make sure to set your model on a GPU device in order to run your model.
low_cpu_mem_usage was None, now set to True since model is quantized.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 5.18it/s]
Some weights of the model checkpoint at /root/autodl-tmp/MiniCPM-V_2_6_awq_int4 were not used when initializing MiniCPMV: ['resampler.attn.out_proj.weight', 'resampler.kv_proj.weight', 'vpm.encoder.layers.0.mlp.fc1.weight', 'vpm.encoder.layers.0.mlp.fc2.weight', 'vpm.encoder.layers.0.self_attn.k_proj.weight', 'vpm.encoder.layers.0.self_attn.out_proj.weight', 'vpm.encoder.layers.0.self_attn.q_proj.weight', 'vpm.encoder.layers.0.self_attn.v_proj.weight', 'vpm.encoder.layers.1.mlp.fc1.weight', 'vpm.encoder.layers.1.mlp.fc2.weight', 'vpm.encoder.layers.1.self_attn.k_proj.weight......
This IS expected if you are initializing MiniCPMV from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing MiniCPMV from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of MiniCPMV were not initialized from the model checkpoint at /root/autodl-tmp/MiniCPM-V_2_6_awq_int4 and are newly initialized: ['resampler.kv_proj.qweight', 'resampler.kv_proj.qzeros', 'resampler.kv_proj.scales', 'vpm.encoder.layers.0.mlp.fc1.qweight', 'vpm.encoder.layers.0.mlp.fc1.qzeros', 'vpm.encoder.layers.0.mlp.fc1.scales', 'vpm.encoder.layers.0.mlp.fc2.qweight', 'vpm.encoder.layers.0.mlp.fc2.qzeros', 'vpm.encoder.layers.0.mlp.fc2.scales......
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Currently using LoRA for fine-tuning the MiniCPM-V model.
{'Total': 1781021056, 'Trainable': 635582976}
llm_type=minicpm
Loading data...
max_steps is given, it will override any value given in num_train_epochs
rank0: Traceback (most recent call last):
rank0: File "/root/cpmv2_6/finetune/finetune.py", line 299, in rank0: File "/root/cpmv2_6/finetune/finetune.py", line 289, in train
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/transformers/trainer.py", line 2052, in train
rank0: return inner_training_loop(
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/transformers/trainer.py", line 2207, in _inner_training_loop
rank0: model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/accelerate/accelerator.py", line 1344, in prepare
rank0: result = self._prepare_deepspeed(*args)
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/accelerate/accelerator.py", line 1851, in _preparedeepspeed
rank0: engine, optimizer, , lr_scheduler = ds_initialize(*kwargs)
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/init.py", line 181, in initialize
rank0: engine = DeepSpeedEngine(args=args,
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 262, in initrank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1112, in _configure_distributed_model
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1174, in to
rank0: return self._apply(convert)
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply
rank0: Previous line repeated 3 more times: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 854, in _apply
rank0: self._buffers[key] = fn(buf)
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1167, in convert
rank0: raise NotImplementedError(
rank0: NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
E1014 12:34:59.731000 140289146423104 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 1999) of binary: /root/miniconda3/envs/cpmv/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/cpmv/bin/torchrun", line 8, in
sys.exit(main())
^^^^^^
File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper
return f(args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/distributed/run.py", line 901, in main
run(args)
File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 133, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
我在3090单卡上尝试使用qlora来微调Minicpm-v-2.6-int4的模型的时候遇到了NotImplementedError下面是具体的输出情况 [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4 [WARNING] using untested triton version (3.0.0), only 1.0.0 is known to be compatible /root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning:
torch.cuda.amp.custom_fwd(args...)
is deprecated. Please usetorch.amp.custom_fwd(args..., device_type='cuda')
instead. @autocast_custom_fwd /root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning:torch.cuda.amp.custom_bwd(args...)
is deprecated. Please usetorch.amp.custom_bwd(args..., device_type='cuda')
instead. @autocast_custom_bwd /root/miniconda3/envs/cpmv/lib/python3.11/site-packages/transformers/training_args.py:1545: FutureWarning:evaluation_strategy
is deprecated and will be removed in version 4.46 of 🤗 Transformers. Useeval_strategy
instead warnings.warn( [2024-10-14 12:34:50,823] [INFO] [comm.py:637:init_distributed] cdb=None [2024-10-14 12:34:50,823] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl You have loaded an AWQ model on CPU and have a CUDA device available, make sure to set your model on a GPU device in order to run your model.low_cpu_mem_usage
was None, now set to True since model is quantized. Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 5.18it/s] Some weights of the model checkpoint at /root/autodl-tmp/MiniCPM-V_2_6_awq_int4 were not used when initializing MiniCPMV: ['resampler.attn.out_proj.weight', 'resampler.kv_proj.weight', 'vpm.encoder.layers.0.mlp.fc1.weight', 'vpm.encoder.layers.0.mlp.fc2.weight', 'vpm.encoder.layers.0.self_attn.k_proj.weight', 'vpm.encoder.layers.0.self_attn.out_proj.weight', 'vpm.encoder.layers.0.self_attn.q_proj.weight', 'vpm.encoder.layers.0.self_attn.v_proj.weight', 'vpm.encoder.layers.1.mlp.fc1.weight', 'vpm.encoder.layers.1.mlp.fc2.weight', 'vpm.encoder.layers.1.self_attn.k_proj.weight......This IS NOT expected if you are initializing MiniCPMV from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of MiniCPMV were not initialized from the model checkpoint at /root/autodl-tmp/MiniCPM-V_2_6_awq_int4 and are newly initialized: ['resampler.kv_proj.qweight', 'resampler.kv_proj.qzeros', 'resampler.kv_proj.scales', 'vpm.encoder.layers.0.mlp.fc1.qweight', 'vpm.encoder.layers.0.mlp.fc1.qzeros', 'vpm.encoder.layers.0.mlp.fc1.scales', 'vpm.encoder.layers.0.mlp.fc2.qweight', 'vpm.encoder.layers.0.mlp.fc2.qzeros', 'vpm.encoder.layers.0.mlp.fc2.scales...... You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Currently using LoRA for fine-tuning the MiniCPM-V model. {'Total': 1781021056, 'Trainable': 635582976} llm_type=minicpm Loading data... max_steps is given, it will override any value given in num_train_epochs rank0: Traceback (most recent call last): rank0: File "/root/cpmv2_6/finetune/finetune.py", line 299, in
rank0: File "/root/cpmv2_6/finetune/finetune.py", line 289, in train
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/transformers/trainer.py", line 2052, in train
rank0: return inner_training_loop(
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/transformers/trainer.py", line 2207, in _inner_training_loop
rank0: model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/accelerate/accelerator.py", line 1344, in prepare
rank0: result = self._prepare_deepspeed(*args)
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/accelerate/accelerator.py", line 1851, in _preparedeepspeed
rank0: engine, optimizer, , lr_scheduler = ds_initialize(*kwargs)
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/init.py", line 181, in initialize
rank0: engine = DeepSpeedEngine(args=args,
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 262, in init
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1112, in _configure_distributed_model
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1174, in to
rank0: return self._apply(convert)
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply
rank0: Previous line repeated 3 more times: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 854, in _apply
rank0: self._buffers[key] = fn(buf)
rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1167, in convert
rank0: raise NotImplementedError(
rank0: NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
E1014 12:34:59.731000 140289146423104 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 1999) of binary: /root/miniconda3/envs/cpmv/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/cpmv/bin/torchrun", line 8, in
sys.exit(main())
^^^^^^
File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper
return f( args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/distributed/run.py", line 901, in main
run(args)
File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 133, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
finetune.py FAILED
Failures:
期望行为 | Expected Behavior
No response
复现方法 | Steps To Reproduce
这是我的finetune_lora.sh文件,严格按照过微调指南进行修改,其余操作也均按照操作指南执行#cd finetune
bash finetune_lora.sh
!/bin/bash
GPUS_PER_NODE=1 NNODES=1 NODE_RANK=0 MASTER_ADDR=localhost MASTER_PORT=6001
MODEL="/root/autodl-tmp/MiniCPM-V_2_6_awq_int4" # or openbmb/MiniCPM-V-2, openbmb/MiniCPM-Llama3-V-2_5
ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.
See the section for finetuning in README for more information.
DATA="/root/cpmv2_6/result_cpmv2_6/processed_data.json" EVAL_DATA="/root/cpmv2_6/result_cpmv2_6/processed_data.json" LLM_TYPE="minicpm"
if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm
if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE=llama3
export NCCL_P2P_DISABLE=1 export NCCL_IB_DISABLE=1
MODEL_MAX_Length=1000 # if conduct multi-images sft, please set MODEL_MAX_Length=4096
DISTRIBUTED_ARGS=" --nproc_per_node $GPUS_PER_NODE \ --nnodes $NNODES \ --node_rank $NODE_RANK \ --master_addr $MASTER_ADDR \ --master_port $MASTER_PORT " torchrun $DISTRIBUTED_ARGS finetune.py \ --model_name_or_path $MODEL \ --llm_type $LLM_TYPE \ --data_path $DATA \ --eval_data_path $EVAL_DATA \ --remove_unused_columns false \ --label_names "labels" \ --prediction_loss_only false \ --bf16 false \ --bf16_full_eval false \ --fp16 true \ --fp16_full_eval true \ --do_train \ --do_eval \ --tune_vision false \ --tune_llm false \ --use_lora true \ --q_lora true \ --tune_vision false \ --lora_target_modules "llm..*layers.\d+.self_attn.(q_proj|k_proj|v_proj|o_proj)" \ --model_max_length $MODEL_MAX_Length \ --max_slice_nums 9 \ --max_steps 10000 \ --eval_steps 1000 \ --output_dir output/output__lora \ --logging_dir output/output_lora \ --logging_strategy "steps" \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "steps" \ --save_strategy "steps" \ --save_steps 1000 \ --save_total_limit 10 \ --learning_rate 1e-6 \ --weight_decay 0.1 \ --adam_beta2 0.95 \ --warmup_ratio 0.01 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --gradient_checkpointing true \ --deepspeed ds_config_zero2.json \ --report_to "tensorboard" # wandb
运行环境 | Environment
备注 | Anything else?
No response