I think this little pull request will be very useful to many. I usually prefer to compute the avg loss across all timesteps and batch examples, instead of the sum of the loss.
This makes the loss value (and gradients) less sensible to the batch size and the length of the samples.
The option is disabled by default, for back-compatibility reasons, and has the name `size_average' as other losses in PyTorch.
Hi,
I think this little pull request will be very useful to many. I usually prefer to compute the avg loss across all timesteps and batch examples, instead of the sum of the loss.
This makes the loss value (and gradients) less sensible to the batch size and the length of the samples.
The option is disabled by default, for back-compatibility reasons, and has the name `size_average' as other losses in PyTorch.