WongKinYiu / yolor

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks (https://arxiv.org/abs/2105.04206)
GNU General Public License v3.0
1.98k stars 524 forks source link

Can't use DDP #69

Open WANGCHIENCHIH opened 2 years ago

WANGCHIENCHIH commented 2 years ago

i try train my model with python -m torch.distributed.launch --nproc_per_node 4 --master_port 9527 train.py --batch-size 64 --img 1280 1280 --data mydata/cfg/data.yaml --cfg mydata/yolor_p6_v2.cfg --weights 'runs/train/yolor_p6/weights/best_p.pt' --hyp '.mydata/cfg/hyp_evolved.yaml' --device 0,1,2,3 --name yolor_p6-v2 --epochs 50 --single-cls --sync-bn --cache-images show that and timeout quit program error prisc

my env torch=1.9.0+cu111

WongKinYiu commented 2 years ago

https://github.com/WongKinYiu/yolor/issues/38#issuecomment-875117732

WongKinYiu commented 2 years ago

timeout issue usually cause by images in a batch have no any gt box.