What config relate to learning rate warm up, weight decay, and momentum in 1 node n GPUs (n > 1 && n < 8) config?

❓ How to do something using VISSL

Describe what you want to do, including:

what I am trying to do: I have read the paper Imagenet-1hour. In there they mentioned the learning rate warm-up, weight decay, and momentum when implementing distributed training in 1 node multi gpus. However, I could not find any documents related to these configs. How could I properly set them?
what outputs you are expecting: A config and an explanation related to learning rate warm-up strategy, weight decay, and momentum in 1 node n gpus machine?

Please link to which API or documentation you're asking about from https://github.com/facebookresearch/vissl/tree/main/docs