QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Apache License 2.0
13.59k stars 1.11k forks source link

请问基于qwen-72b-chat,基于怎样的配置可以在一台4090上训练起来? #1224

Closed taishan1994 closed 5 months ago

taishan1994 commented 5 months ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

No response

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

jklj077 commented 5 months ago

https://github.com/QwenLM/Qwen/tree/main/recipes/finetune/deepspeed#settings-and-gpu-requirements

Not possible for 24GB * 4.

taishan1994 commented 5 months ago

是8卡,不是4卡。因为看到qlora在一张80G显卡上,设置长度为4096,训练需要68.0G,所以通过qlora的方式能否在8*24G的机器上设置长度为4096来微调qwen-72b-chat。但是我是用qlora+zero2进行多GPU微调。发现在模型加载的时候就会报OOM,如果使用zero3,并把model_offord和optim_offord打开,模型可以正常加载,但是qlora和zero3又不能同时使用。所以想问下有什么方式可以做到我想要做的。

jklj077 commented 5 months ago

For ZeRO Stage 2, having each GPU capable of holding the entire model is a bare minimum requirement. Given that Qwen-72B-Chat-Int4 exceeds 40GB, trying to finetune a model of this scale using Q-LoRA with GPUs that only have 24GB (or even 48GB) of memory simply won't cut it.

taishan1994 commented 5 months ago

了解了,谢谢您的回答。