facebookresearch / ClassyVision

An end-to-end PyTorch framework for image and video classification
https://classyvision.ai
MIT License
1.59k stars 278 forks source link

Auto scale learning rate based on batch size #287

Open vreis opened 4 years ago

vreis commented 4 years ago

🚀 Feature

Auto scale learning rate based on batch size

Motivation

Changing the number of workers in distributed training requires adjusting hyperparameters. https://arxiv.org/abs/1706.02677 proposed a linear scaling rule to adjust the learning rate based on the batch size

Pitch

ClassificationTask should have a flag (default True), that would rescale the learning rate based on the batch size. The task is a natural place to put this since we don't want all parameter schedulers to reimplement the same logic. We could consider having the same in the optimizer instead, but I have a sense it'll require more boilerplate.

Alternatives

Hydra (http://hydra.cc) would enable a different solution for this problem: the config file could have a "rescale" parameter for the learning rate, and we could use the "interpolation" feature to rescale by "1/{batch_size}", where batch_size is defined elsewhere in the config.

omry commented 4 years ago

Interpolation does not support arithmetic operations (there is an enhancement request in OmegaConf that I will consider in the future).

For now, you could use to get the batch size into the model, and do the auto scaling in code.

model:
   params:
      ...
      batch_size: ${batch_size}

and do the division in the code.