Auto scale learning rate based on batch size

🚀 Feature

Motivation

Changing the number of workers in distributed training requires adjusting hyperparameters. https://arxiv.org/abs/1706.02677 proposed a linear scaling rule to adjust the learning rate based on the batch size

Pitch

ClassificationTask should have a flag (default True), that would rescale the learning rate based on the batch size. The task is a natural place to put this since we don't want all parameter schedulers to reimplement the same logic. We could consider having the same in the optimizer instead, but I have a sense it'll require more boilerplate.

Alternatives

Hydra (http://hydra.cc) would enable a different solution for this problem: the config file could have a "rescale" parameter for the learning rate, and we could use the "interpolation" feature to rescale by "1/{batch_size}", where batch_size is defined elsewhere in the config.

facebookresearch / ClassyVision