Open vreis opened 4 years ago
Interpolation does not support arithmetic operations (there is an enhancement request in OmegaConf that I will consider in the future).
For now, you could use to get the batch size into the model, and do the auto scaling in code.
model:
params:
...
batch_size: ${batch_size}
and do the division in the code.
🚀 Feature
Auto scale learning rate based on batch size
Motivation
Changing the number of workers in distributed training requires adjusting hyperparameters. https://arxiv.org/abs/1706.02677 proposed a linear scaling rule to adjust the learning rate based on the batch size
Pitch
ClassificationTask should have a flag (default True), that would rescale the learning rate based on the batch size. The task is a natural place to put this since we don't want all parameter schedulers to reimplement the same logic. We could consider having the same in the optimizer instead, but I have a sense it'll require more boilerplate.
Alternatives
Hydra (http://hydra.cc) would enable a different solution for this problem: the config file could have a "rescale" parameter for the learning rate, and we could use the "interpolation" feature to rescale by "1/{batch_size}", where batch_size is defined elsewhere in the config.