Mistake in the focal loss implementation?

Hi, thank you very much for releasing this codebase -- it has been very useful to my project.

I'm wondering if there is a mistake in the Focal Loss implementation. The code in loss/focal.py first calculates CrossEntropy loss, averages it over all samples, and then applies a modulating factor loss = (1 - p) ** self.gamma * logp. If I understand the original Focal Loss paper correctly, they propose to calculate CrossEntropy, apply the modulating factor, and only then average the result over all samples. I wonder, is the order changed on purpose in this repository? I think this way it might be losing the idea of Focal loss...

If it's actually a mistake, a simple fix in the line https://github.com/ZhaoJ9014/face.evoLVe/blob/63520924167efb9ef53dcceed0a15cf739cad1c9/loss/focal.py#L13

to self.ce = nn.CrossEntropyLoss(reduction='none') will suffice.

Other implementations also seem to be having a different order, e.g. see 1 and 2.

ZhaoJ9014 / face.evoLVe

Mistake in the focal loss implementation? #180