facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.22k stars 5.45k forks source link

Question regarding creation of ground-truth anchor boxes for RetinaNet #933

Open JoshVarty opened 5 years ago

JoshVarty commented 5 years ago

I'm trying to better understand how labels are generated for the anchor boxes used in RetinaNet. After creating anchor boxes for each level of an FPN, we call _get_retinanet_blobs().

See: https://github.com/facebookresearch/Detectron/blob/master/detectron/roi_data/retinanet.py#L219-L226

This code creates a numpy array labels with the name shape as our anchor boxes but with every entry set to -1. We then compute the overlap between the anchor boxes we've generated and the ground truth bounding boxes.

 labels = np.empty((num_inside, ), dtype=np.float32)
 labels.fill(-1)

  if len(gt_boxes) > 0:
        # Compute overlaps between the anchors and the gt boxes overlaps
        anchor_by_gt_overlap = box_utils.bbox_overlaps(anchors, gt_boxes)
        # Map from anchor to gt box that has highest overlap
        anchor_to_gt_argmax = anchor_by_gt_overlap.argmax(axis=1)
        # For each anchor, amount of overlap with most overlapping gt box
        anchor_to_gt_max = anchor_by_gt_overlap[
            np.arange(num_inside), anchor_to_gt_argmax]

        # Map from gt box to an anchor that has highest overlap
        gt_to_anchor_argmax = anchor_by_gt_overlap.argmax(axis=0)
        # For each gt box, amount of overlap with most overlapping anchor
        gt_to_anchor_max = anchor_by_gt_overlap[
            gt_to_anchor_argmax, np.arange(anchor_by_gt_overlap.shape[1])]
        # Find all anchors that share the max overlap amount
        # (this includes many ties)
        anchors_with_max_overlap = np.where(
            anchor_by_gt_overlap == gt_to_anchor_max)[0]

The next portion is what confuses me. We find the anchor boxes with the maximum overlap and set labels to represent this class. Immediately after this, we set labels for all anchor boxes that have an overlap greater than cfg.RETINANET.POSITIVE_OVERLAP (0.5 by default).

        # Fg label: for each gt use anchors with highest overlap
        # (including ties)
        gt_inds = anchor_to_gt_argmax[anchors_with_max_overlap]
        labels[anchors_with_max_overlap] = gt_classes[gt_inds]

        # Fg label: above threshold IOU
        inds = anchor_to_gt_max >= cfg.RETINANET.POSITIVE_OVERLAP
        gt_inds = anchor_to_gt_argmax[inds]
        labels[inds] = gt_classes[gt_inds]

Why do we set labels for the maximum overlaps but then immediately set labels for any anchor box that overlaps more than POSITIVE_OVERLAP?

Am I correct in assuming the only time this would matter is when no anchor box exceeds the POSITIVE_OVERLAP value? In this case we'll at least set the maximum value for at least one anchor box per ground-truth bounding box?