Closed knn217 closed 1 year ago
The github page said that all models were trained with batch size of 32, and the learning rate in the provided log is 2e-4. But the default values in main.py is: lr = 2e-4; batch size = 2. So if my device can only do batch size of 2, should I divide the learning rate by 16 or 4?
Also what about the other learning rates: lr_backbone_names, lr_backbone, lr_linear_proj_names, lr_linear_proj_mult, lr_drop? Are they affected by batch size as well?
Im running with the default settings in main.py btw. It's converging but quite slowly when compared to your provided log
i think the batchsize 32 means 2nodes x 8gpus x 2batchsize.
yeah, I changed the learning rate to (2e-4)/16 and it did train faster
The github page said that all models were trained with batch size of 32, and the learning rate in the provided log is 2e-4. But the default values in main.py is: lr = 2e-4; batch size = 2. So if my device can only do batch size of 2, should I divide the learning rate by 16 or 4?
Also what about the other learning rates: lr_backbone_names, lr_backbone, lr_linear_proj_names, lr_linear_proj_mult, lr_drop? Are they affected by batch size as well?
Im running with the default settings in main.py btw. It's converging but quite slowly when compared to your provided log