The output of the net is softmaxed so it’s probabilities, but then we use a loss function that assumes them to be scores and applies softmax again. This is not a big problem in practice, but in principle it doesn’t make sense, and it clashes with the score given by Kaggle if you upload the results, because as far as I understand Kaggle assumes the outputs to be scores, whereas we have probabilities.