是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

我在3090单卡上尝试使用qlora来微调Minicpm-v-2.6-int4的模型的时候遇到了NotImplementedError下面是具体的输出情况 [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4 [WARNING] using untested triton version (3.0.0), only 1.0.0 is known to be compatible /root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py:47: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead. @autocast_custom_fwd /root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/runtime/zero/linear.py:66: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead. @autocast_custom_bwd /root/miniconda3/envs/cpmv/lib/python3.11/site-packages/transformers/training_args.py:1545: FutureWarning: evaluation_strategy is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use eval_strategy instead warnings.warn( [2024-10-14 12:34:50,823] [INFO] [comm.py:637:init_distributed] cdb=None [2024-10-14 12:34:50,823] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl You have loaded an AWQ model on CPU and have a CUDA device available, make sure to set your model on a GPU device in order to run your model. low_cpu_mem_usage was None, now set to True since model is quantized. Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 5.18it/s] Some weights of the model checkpoint at /root/autodl-tmp/MiniCPM-V_2_6_awq_int4 were not used when initializing MiniCPMV: ['resampler.attn.out_proj.weight', 'resampler.kv_proj.weight', 'vpm.encoder.layers.0.mlp.fc1.weight', 'vpm.encoder.layers.0.mlp.fc2.weight', 'vpm.encoder.layers.0.self_attn.k_proj.weight', 'vpm.encoder.layers.0.self_attn.out_proj.weight', 'vpm.encoder.layers.0.self_attn.q_proj.weight', 'vpm.encoder.layers.0.self_attn.v_proj.weight', 'vpm.encoder.layers.1.mlp.fc1.weight', 'vpm.encoder.layers.1.mlp.fc2.weight', 'vpm.encoder.layers.1.self_attn.k_proj.weight......

This IS expected if you are initializing MiniCPMV from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing MiniCPMV from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of MiniCPMV were not initialized from the model checkpoint at /root/autodl-tmp/MiniCPM-V_2_6_awq_int4 and are newly initialized: ['resampler.kv_proj.qweight', 'resampler.kv_proj.qzeros', 'resampler.kv_proj.scales', 'vpm.encoder.layers.0.mlp.fc1.qweight', 'vpm.encoder.layers.0.mlp.fc1.qzeros', 'vpm.encoder.layers.0.mlp.fc1.scales', 'vpm.encoder.layers.0.mlp.fc2.qweight', 'vpm.encoder.layers.0.mlp.fc2.qzeros', 'vpm.encoder.layers.0.mlp.fc2.scales...... You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Currently using LoRA for fine-tuning the MiniCPM-V model. {'Total': 1781021056, 'Trainable': 635582976} llm_type=minicpm Loading data... max_steps is given, it will override any value given in num_train_epochs rank0: Traceback (most recent call last): rank0: File "/root/cpmv2_6/finetune/finetune.py", line 299, in rank0: File "/root/cpmv2_6/finetune/finetune.py", line 289, in train rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/transformers/trainer.py", line 2052, in train rank0: return inner_training_loop( rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/transformers/trainer.py", line 2207, in _inner_training_loop rank0: model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/accelerate/accelerator.py", line 1344, in prepare rank0: result = self._prepare_deepspeed(*args) rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/accelerate/accelerator.py", line 1851, in _preparedeepspeed rank0: engine, optimizer, , lr_scheduler = ds_initialize(*kwargs) rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/init.py", line 181, in initialize rank0: engine = DeepSpeedEngine(args=args, rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 262, in init rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/deepspeed/runtime/engine.py", line 1112, in _configure_distributed_model rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1174, in to rank0: return self._apply(convert) rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 780, in _apply rank0: Previous line repeated 3 more times: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 854, in _apply rank0: self._buffers[key] = fn(buf) rank0: File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1167, in convert rank0: raise NotImplementedError( rank0: NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device. E1014 12:34:59.731000 140289146423104 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 1999) of binary: /root/miniconda3/envs/cpmv/bin/python Traceback (most recent call last): File "/root/miniconda3/envs/cpmv/bin/torchrun", line 8, in sys.exit(main()) ^^^^^^ File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper return f(args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/distributed/run.py", line 901, in main run(args) File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in run elastic_launch( File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 133, in call return launch_agent(self._config, self._entrypoint, list(args)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/miniconda3/envs/cpmv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

finetune.py FAILED

Failures:
------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-10-14_12:34:59 host : autodl-container-a6044fa2b3-3d438d09 rank : 0 (local_rank: 0) exitcode : 1 (pid: 1999) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ============================================================

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

这是我的finetune_lora.sh文件，严格按照过微调指南进行修改，其余操作也均按照操作指南执行#cd finetune

bash finetune_lora.sh

!/bin/bash

GPUS_PER_NODE=1 NNODES=1 NODE_RANK=0 MASTER_ADDR=localhost MASTER_PORT=6001

MODEL="/root/autodl-tmp/MiniCPM-V_2_6_awq_int4" # or openbmb/MiniCPM-V-2, openbmb/MiniCPM-Llama3-V-2_5

ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.

See the section for finetuning in README for more information.

DATA="/root/cpmv2_6/result_cpmv2_6/processed_data.json" EVAL_DATA="/root/cpmv2_6/result_cpmv2_6/processed_data.json" LLM_TYPE="minicpm"

if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm

if use openbmb/MiniCPM-Llama3-V-2_5, please set LLM_TYPE=llama3

export NCCL_P2P_DISABLE=1 export NCCL_IB_DISABLE=1

MODEL_MAX_Length=1000 # if conduct multi-images sft, please set MODEL_MAX_Length=4096

DISTRIBUTED_ARGS=" --nproc_per_node $GPUS_PER_NODE \ --nnodes $NNODES \ --node_rank $NODE_RANK \ --master_addr $MASTER_ADDR \ --master_port $MASTER_PORT " torchrun $DISTRIBUTED_ARGS finetune.py \ --model_name_or_path $MODEL \ --llm_type $LLM_TYPE \ --data_path $DATA \ --eval_data_path $EVAL_DATA \ --remove_unused_columns false \ --label_names "labels" \ --prediction_loss_only false \ --bf16 false \ --bf16_full_eval false \ --fp16 true \ --fp16_full_eval true \ --do_train \ --do_eval \ --tune_vision false \ --tune_llm false \ --use_lora true \ --q_lora true \ --tune_vision false \ --lora_target_modules "llm..*layers.\d+.self_attn.(q_proj|k_proj|v_proj|o_proj)" \ --model_max_length $MODEL_MAX_Length \ --max_slice_nums 9 \ --max_steps 10000 \ --eval_steps 1000 \ --output_dir output/output__lora \ --logging_dir output/output_lora \ --logging_strategy "steps" \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 1 \ --evaluation_strategy "steps" \ --save_strategy "steps" \ --save_steps 1000 \ --save_total_limit 10 \ --learning_rate 1e-6 \ --weight_decay 0.1 \ --adam_beta2 0.95 \ --warmup_ratio 0.01 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --gradient_checkpointing true \ --deepspeed ds_config_zero2.json \ --report_to "tensorboard" # wandb

运行环境 | Environment

- OS:Ubuntu 22.04
- Python:3.11.9
- Transformers:4.45.2
- Torch:2.4.0
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):12.1

备注 | Anything else?