Closed x-x110 closed 3 years ago
batch size 16 / per
Hi, Could you post your training log?
CUDNN_BENCHMARK: false DATALOADER: ASPECT_RATIO_GROUPING: true FILTER_EMPTY_ANNOTATIONS: true NUM_WORKERS: 8 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: [] PROPOSAL_FILES_TRAIN: [] TEST:
There are several key points about how to modify the settings:
I will modify these parameters and provide the result. Thank you for your reply
After modifying these parameters, 37.39 can be obtained in the iteration times of 30000
This result is reasonable.
thanks you reply
There are several key points about how to modify the settings:
- Batch size and learning rate. The default settings are set for 8 GPUs with a total batch size of 64 (8 images per GPU). You have 3 GPUs, and set the batch size as 48 (16 images per GPU). Thus, according to the scaling rule, your learning rate should be 0.12 * 48 / 64 = 0.09.
- Training iterations and learning rate steps. We training a maximum iteration of 22500 for batch size 64, for batch size 48, you should also modify the maximum iteration from 22500 to 22500 64 / 48 = 30000. For the learning rate steps, they also should be re-calculated according to the rule, [15000 64 / 48, 20000 * 64 / 48]
- The warmup iterations and the warmup factor. For batch size 64, we warm up the training for 1500 iterations. Thus, for batch size 48, you can modify it from 1500 iterations to 1500 * 64 / 48 = 2000 iterations. And for the warmup factor, it can be obtained by: 1. / 2000.
hi, i have only 1gpu(8gb) i set batch size as 8 learning rate: 0.12 8 / 64 = 0.0015 maximum iteration: 22500 64 / 8 = 180000 learning rate steps: [15000 64 / 8, 20000 64 / 48] == [120000, 160000] warm up iterations: 1500 * 64 / 8 = 12000 warmup factor: 1. / 2000 = 0.0005 Is it the way I calculated it?
There are several key points about how to modify the settings:
- Batch size and learning rate. The default settings are set for 8 GPUs with a total batch size of 64 (8 images per GPU). You have 3 GPUs, and set the batch size as 48 (16 images per GPU). Thus, according to the scaling rule, your learning rate should be 0.12 * 48 / 64 = 0.09.
- Training iterations and learning rate steps. We training a maximum iteration of 22500 for batch size 64, for batch size 48, you should also modify the maximum iteration from 22500 to 22500 64 / 48 = 30000. For the learning rate steps, they also should be re-calculated according to the rule, [15000 64 / 48, 20000 * 64 / 48]
- The warmup iterations and the warmup factor. For batch size 64, we warm up the training for 1500 iterations. Thus, for batch size 48, you can modify it from 1500 iterations to 1500 * 64 / 48 = 2000 iterations. And for the warmup factor, it can be obtained by: 1. / 2000.
There are several key points about how to modify the settings:
- Batch size and learning rate. The default settings are set for 8 GPUs with a total batch size of 64 (8 images per GPU). You have 3 GPUs, and set the batch size as 48 (16 images per GPU). Thus, according to the scaling rule, your learning rate should be 0.12 * 48 / 64 = 0.09.
- Training iterations and learning rate steps. We training a maximum iteration of 22500 for batch size 64, for batch size 48, you should also modify the maximum iteration from 22500 to 22500 64 / 48 = 30000. For the learning rate steps, they also should be re-calculated according to the rule, [15000 64 / 48, 20000 * 64 / 48]
- The warmup iterations and the warmup factor. For batch size 64, we warm up the training for 1500 iterations. Thus, for batch size 48, you can modify it from 1500 iterations to 1500 * 64 / 48 = 2000 iterations. And for the warmup factor, it can be obtained by: 1. / 2000.
Is this calculation valid for every data set? Are we going to do the same calculation for datasets of different sizes? @chensnathan
My experimental equipment is 3xTitan, and according to the rules of Detectron2, set the learning rate to 0.045. Without modifying any parameters, the resulting map is about 35.6. why?