Note I explicitly keep the current implementation F.cross_entropy if self.options.classification_focal_gamma == 0 out of an abundance of caution even though I checked that, numerically, the more complex formula gives the same answer.
The difference is that F.cross_entropy fuses F.log_softmax and F.nll_loss, so it is supposed to be more numerically stable, which I wanted to keep. But if you think it's unnecessary, we can remove this if-then clause.
Note I explicitly keep the current implementation
F.cross_entropy
ifself.options.classification_focal_gamma == 0
out of an abundance of caution even though I checked that, numerically, the more complex formula gives the same answer.The difference is that
F.cross_entropy
fusesF.log_softmax
andF.nll_loss
, so it is supposed to be more numerically stable, which I wanted to keep. But if you think it's unnecessary, we can remove this if-then clause.@mstamenk