Open Popeyef5 opened 3 years ago
The loss in the MNIST from scratch notebook is off by a factor of 10 due to the averaging. The averaging in NLLLoss is done over the batch, like in https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html
No meaningful result changes, but nice for completeness.
out variable not deleted because it is used for backprop.
The loss in the MNIST from scratch notebook is off by a factor of 10 due to the averaging. The averaging in NLLLoss is done over the batch, like in https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html
No meaningful result changes, but nice for completeness.
out variable not deleted because it is used for backprop.