Closed etema19 closed 10 months ago
Could you please give me some more complete error messages? I think there should be some other outputs before this error message that you have given. And it would be better if you could provide more details about the script you are running.
Thanks~
Since I have been waiting to receive a reply for a long time, I have temporarily closed this issue.
Hello, I ran into an error when I am training the code in distributed mode. Error is as follow "torch.distributed.elastic.multiprocessing.errors.childFailedError: main.py FAILED
any idea?
Thanks!