QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Apache License 2.0
13.59k stars 1.11k forks source link

[BUG] <title> 'ZeRO3 is incompatible with LoRA when finetuning on base model.' #1104

Closed hxhcreate closed 7 months ago

hxhcreate commented 7 months ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

'ZeRO3 is incompatible with LoRA when finetuning on base model.'

期望行为 | Expected Behavior

'ZeRO3 is incompatible with LoRA when finetuning on base model.'

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

我想知道为什么代码当中需要有这一段 ,求解答

if (
        training_args.use_lora
        and not lora_args.q_lora
        and deepspeed.is_deepspeed_zero3_enabled()
        and not is_chat_model
    ):
        raise RuntimeError(
            'ZeRO3 is incompatible with LoRA when finetuning on base model.'
        )
jklj077 commented 7 months ago

Due to incompatibilities, DeepSpeed ZeRO3 and LoRA cannot be used together when fine-tuning a base model. Kindly refer to the README file for further explanation, as this issue has already been addressed there.

image

由于兼容性问题,在对基模型进行微调时,DeepSpeed ZeRO3 与 LoRA 无法同时使用。请务必阅读 README 文件,其中已对此问题进行了详细说明。

image

hxhcreate commented 7 months ago

Thanks for your help! I have read this README.

But my question mainly focuses on why "if we have these parameters trainable, it is not available to use ZeRO 3".

Thanks very much

jklj077 commented 7 months ago

The peft library employs a distinctive technique to render parameters trainable, as evident in its implementation of the ModulesToSaveWrapper. This particular approach has been known to hinder ZeRO Stage3's parameter partitioning capabilities under specific configurations. However, it appears that this issue has been addressed in a recent pull request (PR) #1450 on the huggingface/peft repository (https://github.com/huggingface/peft/pull/1450). We strongly encourage you to review this update.

Please note that we have previously emphasized that Qwen(1.0) codebase and models are no longer subject to further updates. Therefore, for access to the latest features and ongoing support, we advise users to migrate their work to Qwen1.5.

hxhcreate commented 7 months ago

✅ Got you, thanks for your kindly reply

1424153694 commented 4 months ago

Due to incompatibilities, DeepSpeed ZeRO3 and LoRA cannot be used together when fine-tuning a base model. Kindly refer to the README file for further explanation, as this issue has already been addressed there.

image

由于兼容性问题,在对基模型进行微调时,DeepSpeed ZeRO3 与 LoRA 无法同时使用。请务必阅读 README 文件,其中已对此问题进行了详细说明。

image

您好,我对Qwen-14b-chat模型进行lora微调,使用zero3,还是会出现这个问题。另外我在8张4090显卡上进行lora微调,设置zero2的时候,显存会溢出,这个多卡的时候,微调显存是怎样计算的?期待您的回答