是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

在一机八卡的A100（40GB）机器上，运行finetune_lora_ds.sh脚本，进行14B模型的分布式精调，得到如下的错误报告 Traceback (most recent call last): File "finetune.py", line 360, in train() File "finetune.py", line 353, in train trainer.train() File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 1555, in train return inner_training_loop( File "/usr/local/lib/python3.8/dist-packages/transformers/trainer.py", line 1687, in _inner_training_loop model, self.optimizer, self.lr_scheduler = self.accelerator.prepare( File "/usr/local/lib/python3.8/dist-packages/accelerate/accelerator.py", line 1214, in prepare raise ValueError( ValueError: You can't train a model that has been loaded with device_map='auto' in any distributed mode. Please rerun your script specifying --num_processes=1 or by launching with python {{myscript.py}}.

追溯一下原因，是因为项目给出的 finetune.py脚本不支持分布式精调

This serves for single-gpu qlora.

if getattr(training_args, 'deepspeed', None) and int(os.environ.get("WORLD_SIZE", 1))==1:
    training_args.distributed_state.distributed_type = DistributedType.DEEPSPEED

local_rank = training_args.local_rank

device_map = "auto"
world_size = int(os.environ.get("WORLD_SIZE", 1))
ddp = world_size != 1
if lora_args.q_lora:
    device_map = {"": int(os.environ.get("LOCAL_RANK") or 0)} if ddp else "auto"
    if len(training_args.fsdp) > 0 or deepspeed.is_deepspeed_zero3_enabled():
        logging.warning(
            "FSDP or ZeRO3 are incompatible with QLoRA."
        )

期望行为 | Expected Behavior

请问支持分布式精调的 finetune.py 代码应该如何修改？

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS: CentOS Linux 7 (版本 3.10.0-862.el7.x86_64）
- Python: 3.8.10
- Transformers:4.32.0
- PyTorch:2.0.1+cu117
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):11.7

备注 | Anything else?

error_log.txt

QwenLM / Qwen

[BUG] <title> 使用 finetune_lora_ds.sh 脚本跑qwen-14B 的一级多卡lora精调，任务失败 #936