Closed AllenShow closed 6 months ago
Please refer to the reply at https://github.com/QwenLM/Qwen/issues/1104#issuecomment-1972409834.
1. and 2. have the same cause; base model needs to finetune the embedding parameters (because of <|im_start|> and <|im_end|>) and modules_to_save
just does not work with DeepSpeed ZeRO Stage3 (previously).
Qwen1.5 has addressed the issue by making the base model understand the two control tokens.
Hello! Thanks for the great model! I have 2 question: 1, ZeRO3 is incompatible with LoRA when finetuning on base model. Why?(cited from https://github.com/QwenLM/Qwen/blob/main/recipes/tests/test_finetune/test_finetune_ds.py) 2, This table shows the setting of base model with lora is up to zero2 while chat model can be zero3. Why?