Why "ZeRO3 is incompatible with LoRA when finetuning on base model"

QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Apache License 2.0

13.59k stars 1.11k forks source link

Why "ZeRO3 is incompatible with LoRA when finetuning on base model" #1161

Closed AllenShow closed 6 months ago

AllenShow commented 6 months ago

Hello! Thanks for the great model! I have 2 question: 1, ZeRO3 is incompatible with LoRA when finetuning on base model. Why?(cited from https://github.com/QwenLM/Qwen/blob/main/recipes/tests/test_finetune/test_finetune_ds.py) 2， WechatIMG3035 This table shows the setting of base model with lora is up to zero2 while chat model can be zero3. Why?

jklj077 commented 6 months ago

Please refer to the reply at https://github.com/QwenLM/Qwen/issues/1104#issuecomment-1972409834.

1. and 2. have the same cause; base model needs to finetune the embedding parameters (because of <|im_start|> and <|im_end|>) and modules_to_save just does not work with DeepSpeed ZeRO Stage3 (previously).

Qwen1.5 has addressed the issue by making the base model understand the two control tokens.