In the warmup phase:
Learning rate is increased from `init_lr` to `mid_lr`,
Momentum is decreased from `init_mom` to `mid_mom`, to stabilise the use of high LRs
In the convergence phase:
Learning rate is decreased from `mid_lr` to `final_lr`,
Momentum is increased from `mid_mom` to `final_mom`
Setting the learning rate or momentum here will override the values specified when instantiating the `VolumeWrapper`.
learning rate or momentum arguments can be `None` to avoid annealing or overriding their values.
Adds OneCycle scheduler callback for learning rate and momentum, https://arxiv.org/abs/1803.09820