facebookresearch / Mask2Former

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
MIT License
2.59k stars 388 forks source link

Classification loss weight for "no object" is actually 0.2 #203

Open function2-llx opened 1 year ago

function2-llx commented 1 year ago

According to section 4.1 of the paper:

we set λ_cls = 2.0 for predictions matched with a ground truth and 0.1 for the “no object,” i.e., predictions that have not been matched with any ground truth.

The current implementation, however, firstly calculated the cross-entropy loss with weight=1 for foreground and weight=0.1 (eos_coef) for background:

https://github.com/facebookresearch/Mask2Former/blob/9b0651c6c1d5b3af2e6da0589b719c514ec0d69a/mask2former/modeling/criterion.py#L111-L115

https://github.com/facebookresearch/Mask2Former/blob/9b0651c6c1d5b3af2e6da0589b719c514ec0d69a/mask2former/modeling/criterion.py#L136-L138

And finally multiplied by 2 (which is from cfg.MODEL.MASK_FORMER.CLASS_WEIGHT)

https://github.com/facebookresearch/Mask2Former/blob/9b0651c6c1d5b3af2e6da0589b719c514ec0d69a/mask2former/maskformer_model.py#L211-L213

bowenc0221 commented 1 year ago

You are right, the description in the paper is a bit ambiguous here. λ_cls = 2.0 is meant to be the loss weight applied to the classification softmax loss and 0.1 for the “no object,” is meant to be the per-class weight inside the softmax loss (before multiplying with λ_cls).