Questions about computing focal loss

The focal loss implementation comes from the original DETR paper where it was applied to masks for the panoptic segmentation extension of DETR. From the DETR repo:

All losses are normalized by the number of objects inside the batch. Source.

The label loss in Deformable DETR is calculated with:

loss_ce = (
    sigmoid_focal_loss(source_logits, target_classes_onehot, num_boxes, alpha=self.focal_alpha, gamma=2)
    * source_logits.shape[1]
)

sigmoid_focal_loss will first compute loss.mean(1).sum() which sums up the values across all queries and then divides by the number of queries. It then divides by the number of boxes to normalize by the number of GT objects. Finally, the loss returned by sigmoid_focal_loss is multiplied by source_logits.shape[1] (the number of queries) which effectively counteracts the initial division by the number of queries that was performed via loss.mean(1). This makes it so the class loss is only normalized by the number of GT boxes and not by the number of queries.

fundamentalvision / Deformable-DETR

Questions about computing focal loss #172