QwenLM / Qwen2.5

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
8.58k stars 535 forks source link

When I change the model from `Qwen1.5-7B-Chat` to `Qwen2-7B-Instruct`, the same error is still there. #827

Closed chansonzhang closed 2 weeks ago

chansonzhang commented 1 month ago
          When I change the model from `Qwen1.5-7B-Chat` to `Qwen2-7B-Instruct`, the same error is still there.

Originally posted by @chansonzhang in https://github.com/QwenLM/Qwen/issues/1307#issuecomment-2264659388

jklj077 commented 1 month ago

which finetune.py were you using? did you mix the use of Qwen and Qwen2 model and code?

jklj077 commented 1 month ago

we are also in the process of deprecating the finetune.py in this repo and we advise you to use training frameworks, including Axolotl, Llama-Factory, Swift, etc., to finetune your models with SFT, DPO, PPO, etc.

chansonzhang commented 1 month ago

which finetune.py were you using? did you mix the use of Qwen and Qwen2 model and code?

I'm using Qwen/finetune.py.

I was finetuning Qwen 1.5 with model_max_length 32768, and encountered the issue#1307.

I was wondering if it is a bug in Qwen 1.5 and perhaps already been solved in Qwen 2.0. So I tried with Qwen 2.0 by the way, and was not expecting it to work.

However, the error messages and exception stacks are exactly the same when I changed the model. see comment in issue#1307.

I want to know what caused this error and how can I work around it.

chansonzhang commented 1 month ago

we are also in the process of deprecating the finetune.py in this repo and we advise you to use training frameworks, including Axolotl, Llama-Factory, Swift, etc., to finetune your models with SFT, DPO, PPO, etc.

@jklj077 Thank you! Is there any quick start for that?

jklj077 commented 1 month ago

ping @yangjianxin1. he is currently working on the quick start using llama-factory.

for now, there is a very simple version at https://qwen.readthedocs.io/en/latest/training/SFT/llama_factory.html.

chansonzhang commented 1 month ago

@yangjianxin1 where can I found src/train.py mentioned in https://qwen.readthedocs.io/en/latest/training/SFT/llama_factory.html? image

chansonzhang commented 1 month ago

how should I set the param --flash_attn image

There is an error msg "train.py: error: argument --flash_attn: expected one argument"

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.