Open yxchng opened 2 years ago
Hi @yxchng , if you set auto_scale=True
, then you only need to change the base_value
and the values
be automatically scaled to reflect the desired values
@prigoyal That means I will have to sync them manually?
Hi @yxchng, I think there's some confusion. You only need to set the auto_scale=True
and base_value
. The values
will then be automatically scaled and you don't need to do anything. Can you share what behavior you are observing and what you want to achieve? I can help figure out the settings.
@prigoyal Sorry, maybe I was not being too clear. What I wanted to clarify is what happens if the base_value
and the first number in values
do not match? For example, in the config below, I have base_value
being 0.01
and first number in values
being 0.02
. Which takes precedence?
param_schedulers:
lr:
auto_lr_scaling:
auto_scale: true
base_value: 0.01
base_lr_batch_size: 256
name: multistep
values: [0.02, 0.001, 0.0001, 0.00001]
milestones: [24, 48, 72]
update_interval: epoch
I am confused because there are two sets of learning rates. One being the base_value
. The other being the values
in multistep
.
My understanding is that values
correspond to the learning rates at each milestones
. So based on the config above, without auto_lr_scaling
, the initial learning rate is 0.02
, then 0.001
at epoch 24, 0.0001
at epoch 48 and 0.000001
at epoch 72. Is that right?
However, when there is auto_lr_scaling
, I am not sure what will be the behavior.
ah, thank you @yxchng , that example is super helpful. In the above example, with the auto_lr_scaling=True
, if your batch size is 256, then the LR will be scaled to [0.01, 0.001, 0.0001, 0.00001]
and the gamma for the step is calculated based on values[0] and values[1] only https://github.com/facebookresearch/vissl/blob/main/vissl/utils/hydra_config.py#L223-L224.
It sounds like from your example, you want the gamma to be different in which case, we should rather compose LR differently. i.e. you might want 1) constant lr schedule of 0.02 for 24 epochs 2) then multi step LR. Some instructions on how to compose the LR is https://github.com/facebookresearch/vissl/blob/main/vissl/optimizers/param_scheduler/README.md :)
Lmk if this helps and if I understood the problem right. It does sound like we should make our docs further clear and possible add a few examples. :)
@prigoyal Thanks a lot, I think I get it now. That means the last 2 values in values
, i.e. 0.0001, 0.00001
in the config are not used. I am not sure how this can be changed, but the current config is rather confusing.
In param_schedulers, we have
base_value
andvalues
that controls the learning rate. Do I have to change both when I change learning rate? What will happen if the first value invalues
andbase_value
are different? For example, if I want a learning rate of 0.005, do I have to sync them manually by changing bothbase_value
to 0.005 andvalues
to [0.005, 0.0005, 0.00005, 0.000005]?