WongKinYiu / yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
GNU General Public License v3.0
13.02k stars 4.12k forks source link

torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) #73

Open jay985735639 opened 1 year ago

jay985735639 commented 1 year ago

WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 13077 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 13076) of binary: /home/yyx/anaconda3/envs/yolov7/bin/python

python -m torch.distributed.launch --nproc_per_node 2 --master_port 9527 train.py --workers 8 --device 0,1 --sync-bn --batch-size 8 --data data/cloth_1.6.yaml --img 1280 1280 --cfg cfg/training/yolov7.yaml --weights yolov7.pt --name cloth_1.6 --hyp data/hyp.scratch.custom.yaml

There seems to be a problem with distributed train

WongKinYiu commented 1 year ago

New version of PyTorch seems update DDP API.