fundamentalvision / Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Apache License 2.0
3.22k stars 520 forks source link

Should I change the learning rate to (2e-4)/16 if I'm using batch size of 2? #186

Closed knn217 closed 1 year ago

knn217 commented 1 year ago

The github page said that all models were trained with batch size of 32, and the learning rate in the provided log is 2e-4. But the default values in main.py is: lr = 2e-4; batch size = 2. So if my device can only do batch size of 2, should I divide the learning rate by 16 or 4?

Also what about the other learning rates: lr_backbone_names, lr_backbone, lr_linear_proj_names, lr_linear_proj_mult, lr_drop? Are they affected by batch size as well?

Im running with the default settings in main.py btw. It's converging but quite slowly when compared to your provided log

order-a-lemonade commented 1 year ago

The github page said that all models were trained with batch size of 32, and the learning rate in the provided log is 2e-4. But the default values in main.py is: lr = 2e-4; batch size = 2. So if my device can only do batch size of 2, should I divide the learning rate by 16 or 4?

Also what about the other learning rates: lr_backbone_names, lr_backbone, lr_linear_proj_names, lr_linear_proj_mult, lr_drop? Are they affected by batch size as well?

Im running with the default settings in main.py btw. It's converging but quite slowly when compared to your provided log

i think the batchsize 32 means 2nodes x 8gpus x 2batchsize.

knn217 commented 1 year ago

yeah, I changed the learning rate to (2e-4)/16 and it did train faster