what I am trying to do: I have read the paper Imagenet-1hour. In there they mentioned the learning rate warm-up, weight decay, and momentum when implementing distributed training in 1 node multi gpus. However, I could not find any documents related to these configs. How could I properly set them?
what outputs you are expecting: A config and an explanation related to learning rate warm-up strategy, weight decay, and momentum in 1 node n gpus machine?
❓ How to do something using VISSL
Describe what you want to do, including:
❓ What does an API do and how to use it?
Please link to which API or documentation you're asking about from https://github.com/facebookresearch/vissl/tree/main/docs