Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.59k stars 510 forks source link

After I changed the classification loss, why do I still need to use sigmoid when predicting to get the correct classification? #1491

Closed xxxsmlie closed 1 year ago

xxxsmlie commented 1 year ago

💡 Your Question

yolo-nas is a very good model, but now I have a problem. After replacing _varifocal_loss with _logit_norm_loss, I still need to use sigmoid to obtain the correct classification value during detection. Why is this? ‘’‘ def _logit_norm_loss(self, predictions: Tensor, targets: Tensor, tau=0.04): norms = torch.norm(predictions, p=2, dim=-1, keepdim=True) + 1e-7 logit_norm = torch.div(predictions, norms) / tau return F.cross_entropy(logit_norm, targets, reduction='none')’‘’

Versions

No response

BloodAxe commented 1 year ago

There are a few things you have to keep in mind when using YoloNAS model. 1) Outputs of the model in train and eval mode are different. When training, we don't do full decoding of the predictions and only output 'raw' outputs required to compute loss. In eval mode both decoded predictions and 'raw' outputs are returned as a tuple. 2) When model returns decoded predictions, a sigmoid is applied to outputs as can be seen here. 3) To ensure numerical stability during training we do not apply sigmoid to raw predictions and we use logsigmoid instead inside the loss function.

So hope this clarifies when and where activation of classification scores is applied. Regardless of used loss function, whether it is varifocal, focal or just plain BCE, inside loss function you always will be dealing with logits and in eval mode your decoded predictions will be always scores after sigmoid.

xxxsmlie commented 1 year ago

Thank you for your answer. I believe I can better understand the yolo-nas model.