Open lieh1203 opened 4 months ago
What happens if you add "gradient_accumulation_steps": 16
to your config file?
Hi @StellaAthena!
Is it because I enabled both parallel mode and Zero Stage 3 at the same time that caused this error?
Hi @StellaAthena!
Is it because I enabled both parallel mode and Zero Stage 3 at the same time that caused this error?
Zero-3 and PP > 1 should error but I'm surprised it would error like this? Does it go away if you use zero-1?
Hello,
I am encountering an issue with the GPT-NeoX library. When I set either
pipe_parallel_size
ormodel_parallel_size
to 2, I get the following assertion error:I am trying to enable parallelism but this error is preventing me from proceeding.
Here are some details about my setup:
Below is the content of my
2-7B.yml
configuration file:I am using the following command to start the training:
I would appreciate any guidance or suggestions on how to resolve this issue.
Thank you!