Open why228430 opened 4 years ago
Have you solved your problems?
I have figure out the way to solve this problem. this is the problem related to mmdet/core/evaluation/rmean_ap.py, and sometimes when they try to calculate tp and fp,we will encounter a multiprocess hang problem by "pool.starmap".
tpfp = pool.starmap(
rtpfp_default,
zip(cls_dets, cls_gts, cls_gts_ignore,
[iou_thr for _ in range(num_imgs)],
[area_ranges for _ in range(num_imgs)]))
tp, fp = tuple(zip(*tpfp))
so I just give up using multiprocess, I try to use 'for' loops to replace 'pool.starmap' and problem solved.
mytp = []
myfp = []
for cdt, cgt, cgti, iout, arear in zip(cls_dets, cls_gts, cls_gts_ignore,
[iou_thr for _ in range(num_imgs)],
[area_ranges for _ in range(num_imgs)]):
tp, fp = rtpfp_default(cdt, cgt, cgti, iout, arear)
mytp.append(tp)
myfp.append(fp)
mytp = tuple(mytp)
myfp = tuple(myfp)
tp, fp = mytp, myfp
When I run sh rtools/train.sh, the code stop here not run.
, sr0.loss_cls: 0.3226, sr0.loss_bbox: 0.4130, sr1.loss_cls: 0.4546, sr1.loss_bbox: 0.3601, loss: 2.5557, grad_norm: 11.3823 2020-09-22 21:20:43,374 - mmdet - INFO - Epoch [1][4400/4502] lr: 3.567e-03, eta: 1 day, 1:22:21, time: 0.874, data_time: 0.006, memory: 6929, s0.loss_cls: 0.4921, s0.loss_bbox: 0.5438, sr0.loss_cls: 0.3270, sr0.loss_bbox: 0.4025, sr1.loss_cls: 0.3991, sr1.loss_bbox: 0.3454, loss: 2.5098, grad_norm: 10.8762 2020-09-22 21:21:27,289 - mmdet - INFO - Epoch [1][4450/4502] lr: 3.603e-03, eta: 1 day, 1:21:34, time: 0.878, data_time: 0.006, memory: 6929, s0.loss_cls: 0.5008, s0.loss_bbox: 0.5919, sr0.loss_cls: 0.3339, sr0.loss_bbox: 0.4640, sr1.loss_cls: 0.4932, sr1.loss_bbox: 0.4270, loss: 2.8106, grad_norm: 10.2408 2020-09-22 21:22:11,102 - mmdet - INFO - Epoch [1][4500/4502] lr: 3.639e-03, eta: 1 day, 1:20:44, time: 0.876, data_time: 0.006, memory: 6929, s0.loss_cls: 0.4992, s0.loss_bbox: 0.5868, sr0.loss_cls: 0.3604, sr0.loss_bbox: 0.4742, sr1.loss_cls: 0.5154, sr1.loss_bbox: 0.4449, loss: 2.8810, grad_norm: 10.0068 2020-09-22 21:22:12,855 - mmdet - INFO - Saving checkpoint at 1 epochs [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 458/458, 0.5 task/s, elapsed: 962s, ETA: 0s