Closed holydick99 closed 2 months ago
thank you for your attention. try this: https://github.com/THUDM/ChatGLM-6B/pull/1173/files maybe this is because of the problem of version.
sorry about that im a green-hand of this area, so this solution means use these code to replace the original code?
HH, don't feel sorry, I know issues with the environment can be frustrating. What I mean is, try adding the code included to see if it helps. It might be due to torchrun. Try launching with torchrun and also add the code from the link at the beginning of your training script.
so this specific is change this code?
yes, you can have a try.
HH, don't feel sorry, I know issues with the environment can be frustrating. What I mean is, try adding the code included to see if it helps. It might be due to torchrun. Try launching with torchrun and also add the code from the link at the beginning of your training script.
really thank u, the right way is replace that code to (torch.distributed.init_process_group), then it can run normally
HH, don't feel sorry, I know issues with the environment can be frustrating. What I mean is, try adding the code included to see if it helps. It might be due to torchrun. Try launching with torchrun and also add the code from the link at the beginning of your training script.
really thank u, the right way is replace that code to (torch.distributed.init_process_group), then it can run normally
very glad to hear this method can help you.
the last step it look same as the first step but when i input the command line it always runtime error.