Open amulil opened 4 months ago
Hi @amulil ! Please provide the config or log file corresponding to this picture. BTW, have you installed flash_attn?
@HIT-cwh
I use this config, just set batch_size=4. https://github.com/InternLM/xtuner/blob/193f614ffbb2463010808ebb2e689331a9c5e4f6/xtuner/configs/qwen/qwen1_5/qwen1_5_0_5b_chat/qwen1_5_0_5b_chat_qlora_alpaca_e3.py#L40C8-L40C8
Then I use the command CUDA_VISIBLE_DEVICES=4,5,6,7 NPROC_PER_NODE=4 xtuner train qwen1_5_0_5b_chat_qlora_alpaca_e3
to train.
Thanks for your tip, I didn't install flash-attn. After I install it, There is no error info.
But the command I run shouldn't use the sequence parrellel. Its sequence_parallel_world_size is changed to 4.In fact, it should be 1.
@HIT-cwh I use this config, just set batch_size=4. https://github.com/InternLM/xtuner/blob/193f614ffbb2463010808ebb2e689331a9c5e4f6/xtuner/configs/qwen/qwen1_5/qwen1_5_0_5b_chat/qwen1_5_0_5b_chat_qlora_alpaca_e3.py#L40C8-L40C8 Then I use the command
CUDA_VISIBLE_DEVICES=4,5,6,7 NPROC_PER_NODE=4 xtuner train qwen1_5_0_5b_chat_qlora_alpaca_e3
to train.Thanks for your tip, I didn't install flash-attn. After I install it, There is no error info.
But the command I run shouldn't use the sequence parrellel. Its sequence_parallel_world_size is changed to 4.In fact, it should be 1.
I ran into the same problem. Do you have a solution for it, bro?
@HIT-cwh I use this config, just set batch_size=4. https://github.com/InternLM/xtuner/blob/193f614ffbb2463010808ebb2e689331a9c5e4f6/xtuner/configs/qwen/qwen1_5/qwen1_5_0_5b_chat/qwen1_5_0_5b_chat_qlora_alpaca_e3.py#L40C8-L40C8 Then I use the command
CUDA_VISIBLE_DEVICES=4,5,6,7 NPROC_PER_NODE=4 xtuner train qwen1_5_0_5b_chat_qlora_alpaca_e3
to train.Thanks for your tip, I didn't install flash-attn. After I install it, There is no error info.
But the command I run shouldn't use the sequence parrellel. Its sequence_parallel_world_size is changed to 4.In fact, it should be 1.
Currently, there is a bug arising from sequence parallel when training without deepspeed. This pr will fix the bug and will be integrated soon. We apologize for any inconvenience this may have caused.
In addition, we recommand to use DeepSpeed to optimize the training phase by --deepspeed deepspeed_zero1
version
05/09 21:16:21 - mmengine - INFO - 0.1.18
how to reproduce
CUDA_VISIBLE_DEVICES=4,5,6,7 NPROC_PER_NODE=4 xtuner train qwen1_5_0_5b_chat_qlora_alpaca_e3
log
I only change the batch_size to 4 in config file
qwen1_5_0_5b_chat_qlora_alpaca_e3
.But sequence_parallel_world_size is changed to 4.