Closed glenn-jocher closed 4 years ago
Here are the training plots for the 3. results73, results74 and results75 are GIoU, DIoU and CIoU.
@Zzh-tju thanks for the feedback!
Yes we use loss balancing among the 3 losses (box, objectness, classification). We derived these from hyperparameter studies. The balancing we found lets us train to higher mAP than darknet on COCO (+4.5 mAP@0.5:0.95 at 416 resolution) see https://github.com/ultralytics/yolov3/issues/310#issuecomment-549629973.
Yes, I tried to optimize the implementation a bit by inlining the alpha parameter into the equation:
v = (4 / math.pi ** 2) * torch.pow(torch.atan(w2 / h2) - torch.atan(w1 / h1), 2)
return iou - (rho2 / c2 + v ** 2 / (1 - iou + v)) # CIoU
rather than:
v = (4 / math.pi ** 2) * torch.pow(torch.atan(w2 / h2) - torch.atan(w1 / h1), 2)
alpha = v / (1 - iou + v)
return iou - (rho2 / c2 + v * alpha) # CIoU
I don't understand. Can you explain please? For our training comparison we used full COCO2014 trainval (117,000 images) for 27 epochs (10% of full training time).
Alpha actually does not have a gradient backward, just as an adaptively changing coefficient. We have noticed that this is not clearly expressed in the paper.
The degeneration of regression can be referred to IoU-Net (https://arxiv.org/abs/1807.11590). And this phenomenon is very common in the detection pipeline using bbox regression. By the way, the timing of the maximum performance point is also uncertain.
@Zzh-tju ah so the alpha should be under a with torch.no_grad():
statement? Did you get different results this way?
yes
I tested the 3 box regression methods below on https://github.com/ultralytics/yolov3 using yolov3-spp.cfg with swish trained on full COCO2014 to 27 epochs each, but was not able to realize performance improvements with the new methods. I'll try again with LeakyReLU(0.1). The IoU function I implemented is here.