facebookresearch / vissl

VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
https://vissl.ai
MIT License
3.25k stars 331 forks source link

Do I have to sync both `base_value` and `values` manually in config? #446

Open yxchng opened 2 years ago

yxchng commented 2 years ago
param_schedulers:
        lr:
          auto_lr_scaling:
            auto_scale: true
            base_value: 0.01
            base_lr_batch_size: 256
          name: multistep
          values: [0.01, 0.001, 0.0001, 0.00001]
          milestones: [24, 48, 72]
          update_interval: epoch

In param_schedulers, we have base_value and values that controls the learning rate. Do I have to change both when I change learning rate? What will happen if the first value in values and base_value are different? For example, if I want a learning rate of 0.005, do I have to sync them manually by changing both base_value to 0.005 and values to [0.005, 0.0005, 0.00005, 0.000005]?

prigoyal commented 2 years ago

Hi @yxchng , if you set auto_scale=True, then you only need to change the base_value and the values be automatically scaled to reflect the desired values

yxchng commented 2 years ago

@prigoyal That means I will have to sync them manually?

prigoyal commented 2 years ago

Hi @yxchng, I think there's some confusion. You only need to set the auto_scale=True and base_value. The values will then be automatically scaled and you don't need to do anything. Can you share what behavior you are observing and what you want to achieve? I can help figure out the settings.

yxchng commented 2 years ago

@prigoyal Sorry, maybe I was not being too clear. What I wanted to clarify is what happens if the base_value and the first number in values do not match? For example, in the config below, I have base_value being 0.01 and first number in values being 0.02. Which takes precedence?

param_schedulers:
  lr:
    auto_lr_scaling:
      auto_scale: true
      base_value: 0.01
      base_lr_batch_size: 256
    name: multistep
    values: [0.02, 0.001, 0.0001, 0.00001]
    milestones: [24, 48, 72]
    update_interval: epoch

I am confused because there are two sets of learning rates. One being the base_value. The other being the values in multistep.

My understanding is that values correspond to the learning rates at each milestones. So based on the config above, without auto_lr_scaling, the initial learning rate is 0.02, then 0.001 at epoch 24, 0.0001 at epoch 48 and 0.000001 at epoch 72. Is that right?

However, when there is auto_lr_scaling, I am not sure what will be the behavior.

prigoyal commented 2 years ago

ah, thank you @yxchng , that example is super helpful. In the above example, with the auto_lr_scaling=True, if your batch size is 256, then the LR will be scaled to [0.01, 0.001, 0.0001, 0.00001] and the gamma for the step is calculated based on values[0] and values[1] only https://github.com/facebookresearch/vissl/blob/main/vissl/utils/hydra_config.py#L223-L224.

It sounds like from your example, you want the gamma to be different in which case, we should rather compose LR differently. i.e. you might want 1) constant lr schedule of 0.02 for 24 epochs 2) then multi step LR. Some instructions on how to compose the LR is https://github.com/facebookresearch/vissl/blob/main/vissl/optimizers/param_scheduler/README.md :)

Lmk if this helps and if I understood the problem right. It does sound like we should make our docs further clear and possible add a few examples. :)

yxchng commented 2 years ago

@prigoyal Thanks a lot, I think I get it now. That means the last 2 values in values, i.e. 0.0001, 0.00001 in the config are not used. I am not sure how this can be changed, but the current config is rather confusing.