SJTU-Thinklab-Det / r3det-on-mmdetection

R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object
Apache License 2.0
178 stars 42 forks source link

Saving checkpoint at 1 epochs! #10

Open why228430 opened 4 years ago

why228430 commented 4 years ago

When I run sh rtools/train.sh, the code stop here not run.

, sr0.loss_cls: 0.3226, sr0.loss_bbox: 0.4130, sr1.loss_cls: 0.4546, sr1.loss_bbox: 0.3601, loss: 2.5557, grad_norm: 11.3823 2020-09-22 21:20:43,374 - mmdet - INFO - Epoch [1][4400/4502] lr: 3.567e-03, eta: 1 day, 1:22:21, time: 0.874, data_time: 0.006, memory: 6929, s0.loss_cls: 0.4921, s0.loss_bbox: 0.5438, sr0.loss_cls: 0.3270, sr0.loss_bbox: 0.4025, sr1.loss_cls: 0.3991, sr1.loss_bbox: 0.3454, loss: 2.5098, grad_norm: 10.8762 2020-09-22 21:21:27,289 - mmdet - INFO - Epoch [1][4450/4502] lr: 3.603e-03, eta: 1 day, 1:21:34, time: 0.878, data_time: 0.006, memory: 6929, s0.loss_cls: 0.5008, s0.loss_bbox: 0.5919, sr0.loss_cls: 0.3339, sr0.loss_bbox: 0.4640, sr1.loss_cls: 0.4932, sr1.loss_bbox: 0.4270, loss: 2.8106, grad_norm: 10.2408 2020-09-22 21:22:11,102 - mmdet - INFO - Epoch [1][4500/4502] lr: 3.639e-03, eta: 1 day, 1:20:44, time: 0.876, data_time: 0.006, memory: 6929, s0.loss_cls: 0.4992, s0.loss_bbox: 0.5868, sr0.loss_cls: 0.3604, sr0.loss_bbox: 0.4742, sr1.loss_cls: 0.5154, sr1.loss_bbox: 0.4449, loss: 2.8810, grad_norm: 10.0068 2020-09-22 21:22:12,855 - mmdet - INFO - Saving checkpoint at 1 epochs [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 458/458, 0.5 task/s, elapsed: 962s, ETA: 0s

zhuruihe commented 4 years ago

Have you solved your problems?

zhuruihe commented 4 years ago

I have figure out the way to solve this problem. this is the problem related to mmdet/core/evaluation/rmean_ap.py, and sometimes when they try to calculate tp and fp,we will encounter a multiprocess hang problem by "pool.starmap".

        tpfp = pool.starmap(
            rtpfp_default,
            zip(cls_dets, cls_gts, cls_gts_ignore,
                [iou_thr for _ in range(num_imgs)],
                [area_ranges for _ in range(num_imgs)]))
        tp, fp = tuple(zip(*tpfp))

so I just give up using multiprocess, I try to use 'for' loops to replace 'pool.starmap' and problem solved.

        mytp = []
        myfp = []
        for cdt, cgt, cgti, iout, arear in zip(cls_dets, cls_gts, cls_gts_ignore,
                                              [iou_thr for _ in range(num_imgs)],
                                              [area_ranges for _ in range(num_imgs)]):
            tp, fp = rtpfp_default(cdt, cgt, cgti, iout, arear)
            mytp.append(tp)
            myfp.append(fp)
        mytp = tuple(mytp)
        myfp = tuple(myfp)
        tp, fp = mytp, myfp