17CVPR| (Oral) Focal Loss for Dense Object Detection

Main idea:

To solve the extreme foreground-background class imbalance problem in one-stage object detection frameworks, the authors down-weights the loss assigned to well-classified examples by proposing Focal Loss, which adds a factor to the standard cross entropy criterion.

How focal loss works ?

As shown in the figure above, the factor will lower the standard cross entropy loss. Suppose gamma is 2, an example classiﬁed with p_t = 0.9 would have 100× lower loss compared with CE and with p_t ≈ 0.968 it would have 1000× lower loss. This in turn increases the importance of correcting misclassiﬁed examples (whose loss is scaled down by at most 4× for p_t ≤ .5 and gamma = 2).

Several keynotes for me

The results of class imbalance

training is inefficient as most locations are easy negatives that contribute no useful learning signal;

en masse, the easy negatives can overwhelm training and lead to degenerate models.

Two properties of the focal loss

When an example is misclassified and p_t is small, the modulating factor is near 1 and the loss is unaffected. As p_t -> 1, the factor goes to 0 and the loss for well-classified examples is down-weighted.

The focusing parameter gamma smoothly adjusts the rate at which easy examples are down-weighted.

How two-stage detectors address class imbalance?

A two-stage cascade. The first cascade stage is an object proposal mechanism that reduces the nearly infinite set of possible object locations down to one or two thousand.

Biased minibatch sampling. When training the second stage, biased sampling is typically used to construct minibatches that contain, for instance, a 1:3 ratio of positive to negative examples.

XFeiF / ComputerVision_PaperNotes