facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.37k stars 2.41k forks source link

Errors and losses #212

Closed woctezuma closed 4 years ago

woctezuma commented 4 years ago

In log.txt, I see several errors and losses. I would like to know a bit more about some of them.

I would expect the log file to contain:

1) In the log file, first, there are these:

    "train_class_error": 0.801971435546875,
    "train_loss": 2.8446195403734844,
    "train_loss_ce": 0.02718701681296807,
    "train_loss_bbox": 0.14008397981524467,
    "train_loss_giou": 0.2731118065615495,

First, it is mostly fine, except I am not sure what ce stands for. I assume it is the loss associated with the classification error. Is that correct?

Second, I would assume that "class error" is the "weighted fraction of misclassified observations". However, I have seen cases where it was much higher that 1. Isn't it supposed to be between 0 and 1?

2) Then, with the suffix going from 0 to 4:

    "train_loss_ce_3": 0.026680172781925648,
    "train_loss_bbox_3": 0.138540934274594,
    "train_loss_giou_3": 0.26943153887987137,

This looks OK.

I assume these are layer-specific losses, which allow to compute the Hungarian loss as mentioned in section Auxiliary decoding losses in the paper:

We add prediction Prediction feed-forward networks (FFNs) and Hungarian loss after each decoder layer.

This is consistent with the fact that there are 6 decoding layers by default:

    # * Transformer
    parser.add_argument('--dec_layers', default=6, type=int,
                        help="Number of decoding layers in the transformer")

    # Loss
    parser.add_argument('--no_aux_loss', dest='aux_loss', action='store_false',
                        help="Disables auxiliary decoding losses (loss at each layer)")

3) Finally:

    "train_class_error_unscaled": 0.801971435546875,
    "train_loss_ce_unscaled": 0.02718701681296807,
    "train_loss_bbox_unscaled": 0.02801679583887259,
    "train_loss_giou_unscaled": 0.13655590328077474,
    "train_cardinality_error_unscaled": 0.85,

I don't know what "cardinality error" is.

I don't know why train_class_error_unscaled is mentioned, because the classification error is not going to be normalized like the losses are.

Apart from that, it looks OK, as I assume the suffix _unscaled means before the normalization mentioned in appendix A.2:

All losses are normalized by the number of objects inside the batch.

woctezuma commented 4 years ago

tl;dr: my questions are:

woctezuma commented 4 years ago

On a side note, I have not delved too deep into the panoptic segmentation part of the paper.

Are the following losses mentioned in main.py the coefficients for the Focal loss and the Dice loss respectively, as mentioned in section 4.4 on page 15?

    # * Loss coefficients
    parser.add_argument('--mask_loss_coef', default=1, type=float)
    parser.add_argument('--dice_loss_coef', default=1, type=float)
woctezuma commented 4 years ago

Info is all in: https://github.com/facebookresearch/detr/blob/master/models/detr.py

    def loss_boxes(self, outputs, targets, indices, num_boxes):
        """Compute the losses related to the bounding boxes, the L1 regression loss and the GIoU loss
           targets dicts must contain the key "boxes" containing a tensor of dim [nb_target_boxes, 4]
           The target boxes are expected in format (center_x, center_y, w, h), normalized by the image size.
        """

Answer about the _ce suffix: it is for cross-entropy. That is the loss about classes.

    def loss_labels(self, outputs, targets, indices, num_boxes, log=True):
        """Classification loss (NLL)
        targets dicts must contain the key "labels" containing a tensor of dim [nb_target_boxes]
        """
        [...]
        loss_ce = F.cross_entropy(src_logits.transpose(1, 2), target_classes, self.empty_weight)
        losses = {'loss_ce': loss_ce}

Answer about class error:

        if log:
            # TODO this should probably be a separate loss, not hacked in this one here
            losses['class_error'] = 100 - accuracy(src_logits[idx], target_classes_o)[0]
        return losses

where:

def accuracy(output, target, topk=(1,)):
    """Computes the precision@k for the specified values of k"""
    if target.numel() == 0:
        return [torch.zeros([], device=output.device)]
    maxk = max(topk)
    batch_size = target.size(0)

    _, pred = output.topk(maxk, 1, True, True)
    pred = pred.t()
    correct = pred.eq(target.view(1, -1).expand_as(pred))

    res = []
    for k in topk:
        correct_k = correct[:k].view(-1).float().sum(0)
        res.append(correct_k.mul_(100.0 / batch_size))
    return res

Answer about cardinality error:

    def loss_cardinality(self, outputs, targets, indices, num_boxes):
        """ Compute the cardinality error, ie the absolute error in the number of predicted non-empty boxes
        This is not really a loss, it is intended for logging purposes only. It doesn't propagate gradients
        """

Answer about "mask" (Focal) loss and "Dice loss:

    def loss_masks(self, outputs, targets, indices, num_boxes):
        """Compute the losses related to the masks: the focal loss and the dice loss.
           targets dicts must contain the key "masks" containing a tensor of dim [nb_target_boxes, h, w]
        """
        [...]
        losses = {
            "loss_mask": sigmoid_focal_loss(src_masks, target_masks, num_boxes),
            "loss_dice": dice_loss(src_masks, target_masks, num_boxes),
        }
fmassa commented 4 years ago

Hey, sorry for not getting back to you before.

Your findings are correct. One last point about the _unnormalized losses: we have scaling coefficients for each loss that you can see in https://github.com/facebookresearch/detr/blob/5e66b4cd15b2b182da347103dd16578d28b49d69/main.py#L74-L77 those scaling coefficients are used to balance the contribution of each loss to the total loss. The _unscaled values you see in the logs are the original values of the losses, before being scaled by those scaling coefficients, as you can see in https://github.com/facebookresearch/detr/blob/5e66b4cd15b2b182da347103dd16578d28b49d69/engine.py#L39-L42

woctezuma commented 4 years ago

Thank you. It makes sense, and it answers one of the questions which I had (but forgot to ask in my long writing).