I'm trying to better understand how labels are generated for the anchor boxes used in RetinaNet. After creating anchor boxes for each level of an FPN, we call _get_retinanet_blobs().
This code creates a numpy array labels with the name shape as our anchor boxes but with every entry set to -1. We then compute the overlap between the anchor boxes we've generated and the ground truth bounding boxes.
labels = np.empty((num_inside, ), dtype=np.float32)
labels.fill(-1)
if len(gt_boxes) > 0:
# Compute overlaps between the anchors and the gt boxes overlaps
anchor_by_gt_overlap = box_utils.bbox_overlaps(anchors, gt_boxes)
# Map from anchor to gt box that has highest overlap
anchor_to_gt_argmax = anchor_by_gt_overlap.argmax(axis=1)
# For each anchor, amount of overlap with most overlapping gt box
anchor_to_gt_max = anchor_by_gt_overlap[
np.arange(num_inside), anchor_to_gt_argmax]
# Map from gt box to an anchor that has highest overlap
gt_to_anchor_argmax = anchor_by_gt_overlap.argmax(axis=0)
# For each gt box, amount of overlap with most overlapping anchor
gt_to_anchor_max = anchor_by_gt_overlap[
gt_to_anchor_argmax, np.arange(anchor_by_gt_overlap.shape[1])]
# Find all anchors that share the max overlap amount
# (this includes many ties)
anchors_with_max_overlap = np.where(
anchor_by_gt_overlap == gt_to_anchor_max)[0]
The next portion is what confuses me. We find the anchor boxes with the maximum overlap and set labels to represent this class. Immediately after this, we set labels for all anchor boxes that have an overlap greater than cfg.RETINANET.POSITIVE_OVERLAP (0.5 by default).
# Fg label: for each gt use anchors with highest overlap
# (including ties)
gt_inds = anchor_to_gt_argmax[anchors_with_max_overlap]
labels[anchors_with_max_overlap] = gt_classes[gt_inds]
# Fg label: above threshold IOU
inds = anchor_to_gt_max >= cfg.RETINANET.POSITIVE_OVERLAP
gt_inds = anchor_to_gt_argmax[inds]
labels[inds] = gt_classes[gt_inds]
Why do we set labels for the maximum overlaps but then immediately set labels for any anchor box that overlaps more than POSITIVE_OVERLAP?
Am I correct in assuming the only time this would matter is when no anchor box exceeds the POSITIVE_OVERLAP value? In this case we'll at least set the maximum value for at least one anchor box per ground-truth bounding box?
I'm trying to better understand how labels are generated for the anchor boxes used in RetinaNet. After creating anchor boxes for each level of an FPN, we call
_get_retinanet_blobs()
.See: https://github.com/facebookresearch/Detectron/blob/master/detectron/roi_data/retinanet.py#L219-L226
This code creates a numpy array
labels
with the name shape as our anchor boxes but with every entry set to-1
. We then compute the overlap between the anchor boxes we've generated and the ground truth bounding boxes.The next portion is what confuses me. We find the anchor boxes with the maximum overlap and set
labels
to represent this class. Immediately after this, we setlabels
for all anchor boxes that have an overlap greater thancfg.RETINANET.POSITIVE_OVERLAP
(0.5
by default).Why do we set
labels
for the maximum overlaps but then immediately setlabels
for any anchor box that overlaps more thanPOSITIVE_OVERLAP
?Am I correct in assuming the only time this would matter is when no anchor box exceeds the
POSITIVE_OVERLAP
value? In this case we'll at least set the maximum value for at least one anchor box per ground-truth bounding box?