PeizeSun / SparseR-CNN

[CVPR2021, PAMI2023] End-to-End Object Detection with Learnable Proposal
MIT License
1.31k stars 187 forks source link

the learning rate adjustment when training with one gpu or two gpus #80

Open Young0111 opened 3 years ago

Young0111 commented 3 years ago

Thanks for your code.

When I training this code with two GPUs (Tesla P4), changing image_per_batch 4 by running: python projects/SparseRCNN/train_net.py \ --config-file project/SparseRCNN/configs/sparsercnn.res50.100pro.3x.yaml \ --num-gpus 2 SOLVER.IMS_PER_BATCH 4

when it iter 7319, save the module and the AP of it is 3.915, which is different with your 11.440 (in your log).

Refer to detectron, I adjusted the learning rate to 0.0025 by running: python projects/SparseRCNN/train_net.py \ --config-file project/SparseRCNN/configs/sparsercnn.res50.100pro.3x.yaml \ --num-gpus 2 SOLVER.IMS_PER_BATCH 4 SOLVER.BASE_LR 0.0025

and change GPU to 1 by running: python projects/SparseRCNN/train_net.py \ --config-file project/SparseRCNN/configs/sparsercnn.res50.100pro.3x.yaml \ --num-gpus 1 SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025

but the result was worse.

Later I found that they use SGD and you use AdamW, maybe this 0.0025 is not applicable, So I want to know if I need to adjust some parameters. and how to adjust? Thanks.

iFighting commented 3 years ago

you should set BASE_LR as 0.000025 instead of 0.0025.

Young0111 commented 3 years ago

When first running it, the learning rate is default, i.e.0.000025, got a lower AP than the author's log, then I changed the learning rate.

iFighting commented 3 years ago

When first running it, the learning rate is default, i.e.0.000025, got a lower AP than the author's log, then I changed the learning rate.

maybe you need a small lr than 0.000025

Young0111 commented 3 years ago

OK, thanks for your reply, I will try it now.

lingl-space commented 2 years ago

So, what changes have you made? I guess do I need to set lr as 0.000025 * 1/8 and total number of iterations 8 times? However, as the number of iterations increases, the entire training duration becomes extremely large