Closed XFeiF closed 3 years ago
p_t
is small, the modulating factor is near 1 and the loss is unaffected. As p_t -> 1
, the factor goes to 0 and the loss for well-classified examples is down-weighted. gamma
smoothly adjusts the rate at which easy examples are down-weighted.
[Paper]
[Pytorch Code]
Main idea:
To solve the extreme foreground-background class imbalance problem in one-stage object detection frameworks, the authors down-weights the loss assigned to well-classified examples by proposing Focal Loss, which adds a factor to the standard cross entropy criterion.
How focal loss works ?
As shown in the figure above, the factor will lower the standard cross entropy loss. Suppose gamma is 2, an example classified with
p_t = 0.9
would have100×
lower loss compared with CE and withp_t ≈ 0.968
it would have1000×
lower loss. This in turn increases the importance of correcting misclassified examples (whose loss is scaled down byat most 4× for p_t ≤ .5 and gamma = 2
).