Training Performance Tuning

The goal is to decrease the training time by taking advantage of available GPU. The following Performance Tuning Guide will be followed.

To do:

[ ] If GPU is available enable async data loading. Dataloader(..., num_workers=N). Look for a reasonable value of N.
[ ] Rethink appropriate batch size (linked to num_workers).
[ ] If GPU is available allow for memory pinning. Dataloader(..., pin_memory=True).
[x] Replace zeroing gradients by optimizer.zero_grad(set_to_none=True).
[ ] Improve forward methods code quality. Review for-loops and tensor operations.
[ ] Fuse pointwise operators (with torch.jit.script).
[ ] Check for constant valued operations (check gamma update in each FB block).
[ ] Analyse time footprint of each step of the training procedure (forward pass, loss, backpropagation, optimizer step, etc).

CentofantiEze / FBResNet