Zzh-tju / DIoU

Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression (AAAI 2020)
310 stars 50 forks source link

Independent PyTorch-YOLOv3 Results #1

Closed glenn-jocher closed 4 years ago

glenn-jocher commented 4 years ago

I tested the 3 box regression methods below on https://github.com/ultralytics/yolov3 using yolov3-spp.cfg with swish trained on full COCO2014 to 27 epochs each, but was not able to realize performance improvements with the new methods. I'll try again with LeakyReLU(0.1). The IoU function I implemented is here.

python3 train.py --weights '' --epochs 27 --batch-size 16 --accumulate 4 --prebias --cfg cfg/yolov3s.cfg
mAP@0.5 mAP0.5:0.95 Epoch time on 2080Ti
GIoU (default) 49.7 30.2 36min
DIoU 49.4 30.0 36min
CIoU 49.7 30.1 36min
glenn-jocher commented 4 years ago

Here are the training plots for the 3. results73, results74 and results75 are GIoU, DIoU and CIoU. results

Zzh-tju commented 4 years ago
  1. I'm not sure about your trade-off weight of regression loss balanced with the other losses.
  2. The definition of CIoU is a little different, note alpha.
  3. Bbox regression may degenerate if applied for multiple iters. For example, I usually choose 49k and 50k weight files to test on VOC dataset. Wish these help you.
glenn-jocher commented 4 years ago

@Zzh-tju thanks for the feedback!

  1. Yes we use loss balancing among the 3 losses (box, objectness, classification). We derived these from hyperparameter studies. The balancing we found lets us train to higher mAP than darknet on COCO (+4.5 mAP@0.5:0.95 at 416 resolution) see https://github.com/ultralytics/yolov3/issues/310#issuecomment-549629973.

  2. Yes, I tried to optimize the implementation a bit by inlining the alpha parameter into the equation:

    v = (4 / math.pi ** 2) * torch.pow(torch.atan(w2 / h2) - torch.atan(w1 / h1), 2)
    return iou - (rho2 / c2 + v ** 2 / (1 - iou + v))  # CIoU

    rather than:

    v = (4 / math.pi ** 2) * torch.pow(torch.atan(w2 / h2) - torch.atan(w1 / h1), 2)
    alpha = v / (1 - iou + v)
    return iou - (rho2 / c2 + v * alpha)  # CIoU
  3. I don't understand. Can you explain please? For our training comparison we used full COCO2014 trainval (117,000 images) for 27 epochs (10% of full training time).

Zzh-tju commented 4 years ago
  1. Alpha actually does not have a gradient backward, just as an adaptively changing coefficient. We have noticed that this is not clearly expressed in the paper.

  2. The degeneration of regression can be referred to IoU-Net (https://arxiv.org/abs/1807.11590). And this phenomenon is very common in the detection pipeline using bbox regression. By the way, the timing of the maximum performance point is also uncertain.

glenn-jocher commented 4 years ago

@Zzh-tju ah so the alpha should be under a with torch.no_grad(): statement? Did you get different results this way?

Zzh-tju commented 4 years ago

yes