MCG-NJU / MeMOTR

[ICCV 2023] MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking
https://arxiv.org/abs/2307.15700
MIT License
140 stars 8 forks source link

Error while training in distributed mode #1

Closed etema19 closed 10 months ago

etema19 commented 11 months ago

Hello, I ran into an error when I am training the code in distributed mode. Error is as follow "torch.distributed.elastic.multiprocessing.errors.childFailedError: main.py FAILED

any idea?

Thanks!

HELLORPG commented 11 months ago

Could you please give me some more complete error messages? I think there should be some other outputs before this error message that you have given. And it would be better if you could provide more details about the script you are running.

Thanks~

HELLORPG commented 10 months ago

Since I have been waiting to receive a reply for a long time, I have temporarily closed this issue.