Errors and losses - Githubissues

woctezuma commented 4 years ago

In log.txt, I see several errors and losses. I would like to know a bit more about some of them.

I would expect the log file to contain:

a total loss (train_loss),
the three components to the loss (train_loss_ce, train_loss_bbox, train_loss_giou), as detailed in section 3.1 of the paper with the Hungarian loss, then table 4, and also on page 12, where it is mentioned that:

There are three components to the loss: classification loss, L1 bounding box distance loss, and GIoU loss.

1) In the log file, first, there are these:

    "train_class_error": 0.801971435546875,
    "train_loss": 2.8446195403734844,
    "train_loss_ce": 0.02718701681296807,
    "train_loss_bbox": 0.14008397981524467,
    "train_loss_giou": 0.2731118065615495,

First, it is mostly fine, except I am not sure what ce stands for. I assume it is the loss associated with the classification error. Is that correct?

Second, I would assume that "class error" is the "weighted fraction of misclassified observations". However, I have seen cases where it was much higher that 1. Isn't it supposed to be between 0 and 1?

2) Then, with the suffix going from 0 to 4:

    "train_loss_ce_3": 0.026680172781925648,
    "train_loss_bbox_3": 0.138540934274594,
    "train_loss_giou_3": 0.26943153887987137,

This looks OK.

I assume these are layer-specific losses, which allow to compute the Hungarian loss as mentioned in section Auxiliary decoding losses in the paper:

We add prediction Prediction feed-forward networks (FFNs) and Hungarian loss after each decoder layer.

This is consistent with the fact that there are 6 decoding layers by default:

    # * Transformer
    parser.add_argument('--dec_layers', default=6, type=int,
                        help="Number of decoding layers in the transformer")

    # Loss
    parser.add_argument('--no_aux_loss', dest='aux_loss', action='store_false',
                        help="Disables auxiliary decoding losses (loss at each layer)")

3) Finally:

    "train_class_error_unscaled": 0.801971435546875,
    "train_loss_ce_unscaled": 0.02718701681296807,
    "train_loss_bbox_unscaled": 0.02801679583887259,
    "train_loss_giou_unscaled": 0.13655590328077474,
    "train_cardinality_error_unscaled": 0.85,

I don't know what "cardinality error" is.

I don't know why train_class_error_unscaled is mentioned, because the classification error is not going to be normalized like the losses are.

Apart from that, it looks OK, as I assume the suffix _unscaled means before the normalization mentioned in appendix A.2:

All losses are normalized by the number of objects inside the batch.

woctezuma commented 4 years ago

tl;dr: my questions are:

what does _ce in train_loss_ce stand for? It should be the classification loss, so it might be ce for "classfication error"?
why can the "class error" be higher than 1?
what is the cardinality error, which appears in the _unscaled metrics?

woctezuma commented 4 years ago

On a side note, I have not delved too deep into the panoptic segmentation part of the paper.

Are the following losses mentioned in main.py the coefficients for the Focal loss and the Dice loss respectively, as mentioned in section 4.4 on page 15?

    # * Loss coefficients
    parser.add_argument('--mask_loss_coef', default=1, type=float)
    parser.add_argument('--dice_loss_coef', default=1, type=float)

woctezuma commented 4 years ago

Info is all in: https://github.com/facebookresearch/detr/blob/master/models/detr.py

    def loss_boxes(self, outputs, targets, indices, num_boxes):
        """Compute the losses related to the bounding boxes, the L1 regression loss and the GIoU loss
           targets dicts must contain the key "boxes" containing a tensor of dim [nb_target_boxes, 4]
           The target boxes are expected in format (center_x, center_y, w, h), normalized by the image size.
        """

Answer about the _ce suffix: it is for cross-entropy. That is the loss about classes.

    def loss_labels(self, outputs, targets, indices, num_boxes, log=True):
        """Classification loss (NLL)
        targets dicts must contain the key "labels" containing a tensor of dim [nb_target_boxes]
        """
        [...]
        loss_ce = F.cross_entropy(src_logits.transpose(1, 2), target_classes, self.empty_weight)
        losses = {'loss_ce': loss_ce}

Answer about class error:

        if log:
            # TODO this should probably be a separate loss, not hacked in this one here
            losses['class_error'] = 100 - accuracy(src_logits[idx], target_classes_o)[0]
        return losses

where:

def accuracy(output, target, topk=(1,)):
    """Computes the precision@k for the specified values of k"""
    if target.numel() == 0:
        return [torch.zeros([], device=output.device)]
    maxk = max(topk)
    batch_size = target.size(0)

    _, pred = output.topk(maxk, 1, True, True)
    pred = pred.t()
    correct = pred.eq(target.view(1, -1).expand_as(pred))

    res = []
    for k in topk:
        correct_k = correct[:k].view(-1).float().sum(0)
        res.append(correct_k.mul_(100.0 / batch_size))
    return res

Answer about cardinality error:

    def loss_cardinality(self, outputs, targets, indices, num_boxes):
        """ Compute the cardinality error, ie the absolute error in the number of predicted non-empty boxes
        This is not really a loss, it is intended for logging purposes only. It doesn't propagate gradients
        """

Answer about "mask" (Focal) loss and "Dice loss:

    def loss_masks(self, outputs, targets, indices, num_boxes):
        """Compute the losses related to the masks: the focal loss and the dice loss.
           targets dicts must contain the key "masks" containing a tensor of dim [nb_target_boxes, h, w]
        """
        [...]
        losses = {
            "loss_mask": sigmoid_focal_loss(src_masks, target_masks, num_boxes),
            "loss_dice": dice_loss(src_masks, target_masks, num_boxes),
        }

fmassa commented 4 years ago

Hey, sorry for not getting back to you before.

Your findings are correct. One last point about the _unnormalized losses: we have scaling coefficients for each loss that you can see in https://github.com/facebookresearch/detr/blob/5e66b4cd15b2b182da347103dd16578d28b49d69/main.py#L74-L77 those scaling coefficients are used to balance the contribution of each loss to the total loss. The _unscaled values you see in the logs are the original values of the losses, before being scaled by those scaling coefficients, as you can see in https://github.com/facebookresearch/detr/blob/5e66b4cd15b2b182da347103dd16578d28b49d69/engine.py#L39-L42

woctezuma commented 4 years ago

Thank you. It makes sense, and it answers one of the questions which I had (but forgot to ask in my long writing).

facebookresearch / detr

Errors and losses #212