Unexpected huge initial loss value with mask2former

NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

MIT License

8.48k stars 1.33k forks source link

Unexpected huge initial loss value with mask2former #299

Closed matteot11 closed 11 months ago

matteot11 commented 1 year ago

Hi, thanks for your work. I was following the tutorial to fine-tune the Mask2Former model here, since it has the same API as MaskFormer. However, when replacing MaskFormerImageProcessor -> Mask2FormerImageProcessor and MaskFormerForInstanceSegmentation -> Mask2FormerForUniversalSegmentation in the tutorial (without changing anything else), the loss value from the output has a huge value (>100). Is it expected? With MaskFormer the loss seems reasonable (~3-4 on a different dataset).

Thanks for your help!

Satwato commented 11 months ago

@matteot11 Facing the same. Did you find a reason? Managed to bring it down to around ~60 inititally and then it stagnates around ~23. Any idea why?

matteot11 commented 11 months ago

Hi @Satwato, I did not find a way to lower the loss, unfortunately. However, in the paper it is reported that: "An auxiliary loss is added to every intermediate Transformer decoder layer and to the learnable query features before the Transformer decoder". So maybe this huge loss value is expected. After training, results seem good, ineed. Hope it helps.

NielsRogge commented 11 months ago

Hi both, thanks for your interest in Mask2Former. The auxiliary loss can be turned on/off using config.use_auxiliary_loss, which can be set to True/False. However the auxiliary loss results in better performance, according to the authors.

NielsRogge commented 11 months ago

Will close this issue as it's resolved, feel free to reopen.