chensnathan / YOLOF

You Only Look One-level Feature (YOLOF), CVPR2021, Detectron2
MIT License
271 stars 28 forks source link

gap in map #19

Closed x-x110 closed 3 years ago

x-x110 commented 3 years ago

My experimental equipment is 3xTitan, and according to the rules of Detectron2, set the learning rate to 0.045. Without modifying any parameters, the resulting map is about 35.6. why?

x-x110 commented 3 years ago

batch size 16 / per

chensnathan commented 3 years ago

Hi, Could you post your training log?

x-x110 commented 3 years ago

CUDNN_BENCHMARK: false DATALOADER: ASPECT_RATIO_GROUPING: true FILTER_EMPTY_ANNOTATIONS: true NUM_WORKERS: 8 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: [] PROPOSAL_FILES_TRAIN: [] TEST:

chensnathan commented 3 years ago

There are several key points about how to modify the settings:

  1. Batch size and learning rate. The default settings are set for 8 GPUs with a total batch size of 64 (8 images per GPU). You have 3 GPUs, and set the batch size as 48 (16 images per GPU). Thus, according to the scaling rule, your learning rate should be 0.12 * 48 / 64 = 0.09.
  2. Training iterations and learning rate steps. We training a maximum iteration of 22500 for batch size 64, for batch size 48, you should also modify the maximum iteration from 22500 to 22500 64 / 48 = 30000. For the learning rate steps, they also should be re-calculated according to the rule, [15000 64 / 48, 20000 * 64 / 48]
  3. The warmup iterations and the warmup factor. For batch size 64, we warm up the training for 1500 iterations. Thus, for batch size 48, you can modify it from 1500 iterations to 1500 * 64 / 48 = 2000 iterations. And for the warmup factor, it can be obtained by: 1. / 2000.
x-x110 commented 3 years ago

I will modify these parameters and provide the result. Thank you for your reply

x-x110 commented 3 years ago

After modifying these parameters, 37.39 can be obtained in the iteration times of 30000

chensnathan commented 3 years ago

This result is reasonable.

x-x110 commented 3 years ago

thanks you reply

shenhaibb commented 2 years ago

There are several key points about how to modify the settings:

  1. Batch size and learning rate. The default settings are set for 8 GPUs with a total batch size of 64 (8 images per GPU). You have 3 GPUs, and set the batch size as 48 (16 images per GPU). Thus, according to the scaling rule, your learning rate should be 0.12 * 48 / 64 = 0.09.
  2. Training iterations and learning rate steps. We training a maximum iteration of 22500 for batch size 64, for batch size 48, you should also modify the maximum iteration from 22500 to 22500 64 / 48 = 30000. For the learning rate steps, they also should be re-calculated according to the rule, [15000 64 / 48, 20000 * 64 / 48]
  3. The warmup iterations and the warmup factor. For batch size 64, we warm up the training for 1500 iterations. Thus, for batch size 48, you can modify it from 1500 iterations to 1500 * 64 / 48 = 2000 iterations. And for the warmup factor, it can be obtained by: 1. / 2000.

hi, i have only 1gpu(8gb) i set batch size as 8 learning rate: 0.12 8 / 64 = 0.0015 maximum iteration: 22500 64 / 8 = 180000 learning rate steps: [15000 64 / 8, 20000 64 / 48] == [120000, 160000] warm up iterations: 1500 * 64 / 8 = 12000 warmup factor: 1. / 2000 = 0.0005 Is it the way I calculated it?

SelimSavas commented 1 year ago

There are several key points about how to modify the settings:

  1. Batch size and learning rate. The default settings are set for 8 GPUs with a total batch size of 64 (8 images per GPU). You have 3 GPUs, and set the batch size as 48 (16 images per GPU). Thus, according to the scaling rule, your learning rate should be 0.12 * 48 / 64 = 0.09.
  2. Training iterations and learning rate steps. We training a maximum iteration of 22500 for batch size 64, for batch size 48, you should also modify the maximum iteration from 22500 to 22500 64 / 48 = 30000. For the learning rate steps, they also should be re-calculated according to the rule, [15000 64 / 48, 20000 * 64 / 48]
  3. The warmup iterations and the warmup factor. For batch size 64, we warm up the training for 1500 iterations. Thus, for batch size 48, you can modify it from 1500 iterations to 1500 * 64 / 48 = 2000 iterations. And for the warmup factor, it can be obtained by: 1. / 2000.

There are several key points about how to modify the settings:

  1. Batch size and learning rate. The default settings are set for 8 GPUs with a total batch size of 64 (8 images per GPU). You have 3 GPUs, and set the batch size as 48 (16 images per GPU). Thus, according to the scaling rule, your learning rate should be 0.12 * 48 / 64 = 0.09.
  2. Training iterations and learning rate steps. We training a maximum iteration of 22500 for batch size 64, for batch size 48, you should also modify the maximum iteration from 22500 to 22500 64 / 48 = 30000. For the learning rate steps, they also should be re-calculated according to the rule, [15000 64 / 48, 20000 * 64 / 48]
  3. The warmup iterations and the warmup factor. For batch size 64, we warm up the training for 1500 iterations. Thus, for batch size 48, you can modify it from 1500 iterations to 1500 * 64 / 48 = 2000 iterations. And for the warmup factor, it can be obtained by: 1. / 2000.

Is this calculation valid for every data set? Are we going to do the same calculation for datasets of different sizes? @chensnathan