Loss was off by factor 10

The loss in the MNIST from scratch notebook is off by a factor of 10 due to the averaging. The averaging in NLLLoss is done over the batch, like in https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html

No meaningful result changes, but nice for completeness.

out variable not deleted because it is used for backprop.

geohot / ai-notebooks