Not accounting for potentially smaller final batch at gradient computation in training loop

File: train.py Function: train_one_epoch(...)

Description:

Assuming reduction method set to 'mean'(default) for loss function and drop_last set to False(default) for DataLoader, calling loss.backward() without accounting for potentially smaller size of the final batch could yield disproportionate effects on parameter update (and more generally, the optimizer behaviour).

Proposed solutions:

Multiply loss by x.size(0) / val_loader.batch_size prior to calling loss.backward() (gradient computation). However, this would result in this factor being equal to 1, i.e useless, for all-but-one batches.
Apply an opposite parameter update as a compensation to the final parameter update, after the batch-iterating loop. However, for optimizers not as trivial as SGD, calling optimizer.step() implies more than just updating parameters based on gradient.
Require drop_last to be True, bypassing the issue.
Require reduction method to be set to 'sum'. This, however, is a highly inconvenient solution as it would make learning rate and batch size parameterization to be co-dependent. That's why 'mean' is usually preferred for easier parameterization.

achgls / music-genre-classification

Not accounting for potentially smaller final batch at gradient computation in training loop #3

Description:

Proposed solutions: