Closed Johswald closed 5 years ago
Yes, you are right. Thank you for pointing this out. There should indeed be a summation over the classes before taking the mean over the batch. This has now been corrected in the code. Note that there was a similar issue with the binary classification loss (e.g., used in iCaRL) that now also has been corrected. My first impression is that it only has a small effect on the reported MNIST results, but I will test further. Especially for very long task protocols this correction might be quite important. Many thanks for your feedback!
thank you for the nice code!
Hey, In order to compute the cross entropy between the "soft" targets and the predictions you do the following: KD_loss_unnorm = (-targets_norm * log_scores_norm).mean() #--> average over batch
Wouldnt the correct cross entropy with mean over the batch be:
KD_loss_unnorm = (-targets_norm * log_scores_norm).sum(dim=1).mean()