gap in map - Githubissues

x-x110 commented 3 years ago

My experimental equipment is 3xTitan, and according to the rules of Detectron2, set the learning rate to 0.045. Without modifying any parameters, the resulting map is about 35.6. why?

x-x110 commented 3 years ago

batch size 16 / per

chensnathan commented 3 years ago

Hi, Could you post your training log?

x-x110 commented 3 years ago

CUDNN_BENCHMARK: false DATALOADER: ASPECT_RATIO_GROUPING: true FILTER_EMPTY_ANNOTATIONS: true NUM_WORKERS: 8 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: [] PROPOSAL_FILES_TRAIN: [] TEST:

coco_2017_val TRAIN:
coco_2017_train GLOBAL: HACK: 1.0 INPUT: CROP: ENABLED: false SIZE:
- 0.9
- 0.9 TYPE: relative_range DISTORTION: ENABLED: false EXPOSURE: 1.5 HUE: 0.1 SATURATION: 1.5 FORMAT: BGR JITTER_CROP: ENABLED: false JITTER_RATIO: 0.3 MASK_FORMAT: polygon MAX_SIZE_TEST: 1333 MAX_SIZE_TRAIN: 1333 MIN_SIZE_TEST: 800 MIN_SIZE_TRAIN:
800 MIN_SIZE_TRAIN_SAMPLING: choice MOSAIC: ENABLED: false MIN_OFFSET: 0.2 MOSAIC_HEIGHT: 640 MOSAIC_WIDTH: 640 NUM_IMAGES: 4 POOL_CAPACITY: 1000 RANDOM_FLIP: horizontal RESIZE: ENABLED: false SCALE_JITTER:
- 0.8
- 1.2 SHAPE:
- 640
- 640 TEST_SHAPE:
- 608
- 608 SHIFT: SHIFT_PIXELS: 32 MODEL: ANCHOR_GENERATOR: ANGLES:
- - -90
  - 0
  - 90 ASPECT_RATIOS:
- - 1.0 NAME: DefaultAnchorGenerator OFFSET: 0.0 SIZES:
- - 32
  - 64
  - 128
  - 256
  - 512 BACKBONE: FREEZE_AT: 2 NAME: build_resnet_backbone DARKNET: DEPTH: 53 NORM: BN OUT_FEATURES:
- res5 RES5_DILATION: 1 WITH_CSP: true DEVICE: cuda FPN: FUSE_TYPE: sum IN_FEATURES: [] NORM: '' OUT_CHANNELS: 256 KEYPOINT_ON: false LOAD_PROPOSALS: false MASK_ON: false META_ARCHITECTURE: YOLOF PANOPTIC_FPN: COMBINE: ENABLED: true INSTANCES_CONFIDENCE_THRESH: 0.5 OVERLAP_THRESH: 0.5 STUFF_AREA_LIMIT: 4096 INSTANCE_LOSS_WEIGHT: 1.0 PIXEL_MEAN:
103.53
116.28
123.675 PIXEL_STD:
1.0
1.0
1.0 PROPOSAL_GENERATOR: MIN_SIZE: 0 NAME: RPN RESNETS: DEFORM_MODULATED: false DEFORM_NUM_GROUPS: 1 DEFORM_ON_PER_STAGE:
- false
- false
- false
- false DEPTH: 50 NORM: FrozenBN NUM_GROUPS: 1 OUT_FEATURES:
- res5 RES2_OUT_CHANNELS: 256 RES5_DILATION: 1 STEM_OUT_CHANNELS: 64 STRIDE_IN_1X1: true WIDTH_PER_GROUP: 64 RETINANET: BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_WEIGHTS: &id001
- 1.0
- 1.0
- 1.0
- 1.0 FOCAL_LOSS_ALPHA: 0.25 FOCAL_LOSS_GAMMA: 2.0 IN_FEATURES:
- p3
- p4
- p5
- p6
- p7 IOU_LABELS:
- 0
- -1
- 1 IOU_THRESHOLDS:
- 0.4
- 0.5 NMS_THRESH_TEST: 0.5 NORM: '' NUM_CLASSES: 80 NUM_CONVS: 4 PRIOR_PROB: 0.01 SCORE_THRESH_TEST: 0.05 SMOOTH_L1_LOSS_BETA: 0.1 TOPK_CANDIDATES_TEST: 1000 ROI_BOX_CASCADE_HEAD: BBOX_REG_WEIGHTS:
- - 10.0
  - 10.0
  - 5.0
  - 5.0
- - 20.0
  - 20.0
  - 10.0
  - 10.0
- - 30.0
  - 30.0
  - 15.0
  - 15.0 IOUS:
- 0.5
- 0.6
- 0.7 ROI_BOX_HEAD: BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_LOSS_WEIGHT: 1.0 BBOX_REG_WEIGHTS:
- 10.0
- 10.0
- 5.0
- 5.0 CLS_AGNOSTIC_BBOX_REG: false CONV_DIM: 256 FC_DIM: 1024 NAME: '' NORM: '' NUM_CONV: 0 NUM_FC: 0 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 SMOOTH_L1_BETA: 0.0 TRAIN_ON_PRED_BOXES: false ROI_HEADS: BATCH_SIZE_PER_IMAGE: 512 IN_FEATURES:
- res4 IOU_LABELS:
- 0
- 1 IOU_THRESHOLDS:
- 0.5 NAME: Res5ROIHeads NMS_THRESH_TEST: 0.5 NUM_CLASSES: 80 POSITIVE_FRACTION: 0.25 PROPOSAL_APPEND_GT: true SCORE_THRESH_TEST: 0.05 ROI_KEYPOINT_HEAD: CONV_DIMS:
- 512
- 512
- 512
- 512
- 512
- 512
- 512
- 512 LOSS_WEIGHT: 1.0 MIN_KEYPOINTS_PER_IMAGE: 1 NAME: KRCNNConvDeconvUpsampleHead NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true NUM_KEYPOINTS: 17 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 ROI_MASK_HEAD: CLS_AGNOSTIC_MASK: false CONV_DIM: 256 NAME: MaskRCNNConvUpsampleHead NORM: '' NUM_CONV: 0 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 RPN: BATCH_SIZE_PER_IMAGE: 256 BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_LOSS_WEIGHT: 1.0 BBOX_REG_WEIGHTS: *id001 BOUNDARY_THRESH: -1 CONV_DIMS:
- -1 HEAD_NAME: StandardRPNHead IN_FEATURES:
- res4 IOU_LABELS:
- 0
- -1
- 1 IOU_THRESHOLDS:
- 0.3
- 0.7 LOSS_WEIGHT: 1.0 NMS_THRESH: 0.7 POSITIVE_FRACTION: 0.5 POST_NMS_TOPK_TEST: 1000 POST_NMS_TOPK_TRAIN: 2000 PRE_NMS_TOPK_TEST: 6000 PRE_NMS_TOPK_TRAIN: 12000 SMOOTH_L1_BETA: 0.0 SEM_SEG_HEAD: COMMON_STRIDE: 4 CONVS_DIM: 128 IGNORE_VALUE: 255 IN_FEATURES:
- p2
- p3
- p4
- p5 LOSS_WEIGHT: 1.0 NAME: SemSegFPNHead NORM: GN NUM_CLASSES: 54 WEIGHTS: detectron2://ImageNetPretrained/MSRA/R-50.pkl YOLOF: BOX_TRANSFORM: ADD_CTR_CLAMP: true BBOX_REG_WEIGHTS:
  - 1.0
  - 1.0
  - 1.0
  - 1.0 CTR_CLAMP: 32 DECODER: ACTIVATION: ReLU CLS_NUM_CONVS: 2 IN_CHANNELS: 512 NORM: BN NUM_ANCHORS: 5 NUM_CLASSES: 80 PRIOR_PROB: 0.01 REG_NUM_CONVS: 4 DETECTIONS_PER_IMAGE: 100 ENCODER: ACTIVATION: ReLU BACKBONE_LEVEL: res5 BLOCK_DILATIONS:
  - 2
  - 4
  - 6
  - 8 BLOCK_MID_CHANNELS: 128 IN_CHANNELS: 2048 NORM: BN NUM_CHANNELS: 512 NUM_RESIDUAL_BLOCKS: 4 LOSSES: BBOX_REG_LOSS_TYPE: giou FOCAL_LOSS_ALPHA: 0.25 FOCAL_LOSS_GAMMA: 2.0 MATCHER: TOPK: 4 NEG_IGNORE_THRESHOLD: 0.7 NMS_THRESH_TEST: 0.6 POS_IGNORE_THRESHOLD: 0.15 SCORE_THRESH_TEST: 0.05 TOPK_CANDIDATES_TEST: 1000 OUTPUT_DIR: output/yolof/R_50_C5_1x SEED: -1 SOLVER: AMP: ENABLED: false BACKBONE_MULTIPLIER: 0.334 BASE_LR: 0.045 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 2500 CLIP_GRADIENTS: CLIP_TYPE: value CLIP_VALUE: 1.0 ENABLED: false NORM_TYPE: 2.0 GAMMA: 0.1 IMS_PER_BATCH: 48 LR_SCHEDULER_NAME: WarmupMultiStepLR MAX_ITER: 22500 MOMENTUM: 0.9 NESTEROV: false REFERENCE_WORLD_SIZE: 0 STEPS:
15000
20000 WARMUP_FACTOR: 0.00066667 WARMUP_ITERS: 1500 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0001 WEIGHT_DECAY_BIAS: 0.0001 WEIGHT_DECAY_NORM: 0.0 TEST: AUG: ENABLED: false FLIP: true MAX_SIZE: 4000 MIN_SIZES:
- 400
- 500
- 600
- 700
- 800
- 900
- 1000
- 1100
- 1200 DETECTIONS_PER_IMAGE: 100 EVAL_PERIOD: 0 EXPECTED_RESULTS: [] KEYPOINT_OKS_SIGMAS: [] PRECISE_BN: ENABLED: false NUM_ITER: 200 VERSION: 2 VIS_PERIOD: 0

chensnathan commented 3 years ago

There are several key points about how to modify the settings:

Batch size and learning rate. The default settings are set for 8 GPUs with a total batch size of 64 (8 images per GPU). You have 3 GPUs, and set the batch size as 48 (16 images per GPU). Thus, according to the scaling rule, your learning rate should be 0.12 * 48 / 64 = 0.09.
Training iterations and learning rate steps. We training a maximum iteration of 22500 for batch size 64, for batch size 48, you should also modify the maximum iteration from 22500 to 22500 64 / 48 = 30000. For the learning rate steps, they also should be re-calculated according to the rule, [15000 64 / 48, 20000 * 64 / 48]
The warmup iterations and the warmup factor. For batch size 64, we warm up the training for 1500 iterations. Thus, for batch size 48, you can modify it from 1500 iterations to 1500 * 64 / 48 = 2000 iterations. And for the warmup factor, it can be obtained by: 1. / 2000.

x-x110 commented 3 years ago

I will modify these parameters and provide the result. Thank you for your reply

x-x110 commented 3 years ago

After modifying these parameters, 37.39 can be obtained in the iteration times of 30000

chensnathan commented 3 years ago

This result is reasonable.

x-x110 commented 3 years ago

thanks you reply

shenhaibb commented 2 years ago

There are several key points about how to modify the settings:

Batch size and learning rate. The default settings are set for 8 GPUs with a total batch size of 64 (8 images per GPU). You have 3 GPUs, and set the batch size as 48 (16 images per GPU). Thus, according to the scaling rule, your learning rate should be 0.12 * 48 / 64 = 0.09.

Training iterations and learning rate steps. We training a maximum iteration of 22500 for batch size 64, for batch size 48, you should also modify the maximum iteration from 22500 to 22500 64 / 48 = 30000. For the learning rate steps, they also should be re-calculated according to the rule, [15000 64 / 48, 20000 * 64 / 48]

The warmup iterations and the warmup factor. For batch size 64, we warm up the training for 1500 iterations. Thus, for batch size 48, you can modify it from 1500 iterations to 1500 * 64 / 48 = 2000 iterations. And for the warmup factor, it can be obtained by: 1. / 2000.

hi, i have only 1gpu(8gb) i set batch size as 8 learning rate: 0.12 8 / 64 = 0.0015 maximum iteration: 22500 64 / 8 = 180000 learning rate steps: [15000 64 / 8, 20000 64 / 48] == [120000, 160000] warm up iterations: 1500 * 64 / 8 = 12000 warmup factor: 1. / 2000 = 0.0005 Is it the way I calculated it？

SelimSavas commented 1 year ago

There are several key points about how to modify the settings:

Batch size and learning rate. The default settings are set for 8 GPUs with a total batch size of 64 (8 images per GPU). You have 3 GPUs, and set the batch size as 48 (16 images per GPU). Thus, according to the scaling rule, your learning rate should be 0.12 * 48 / 64 = 0.09.

Training iterations and learning rate steps. We training a maximum iteration of 22500 for batch size 64, for batch size 48, you should also modify the maximum iteration from 22500 to 22500 64 / 48 = 30000. For the learning rate steps, they also should be re-calculated according to the rule, [15000 64 / 48, 20000 * 64 / 48]

The warmup iterations and the warmup factor. For batch size 64, we warm up the training for 1500 iterations. Thus, for batch size 48, you can modify it from 1500 iterations to 1500 * 64 / 48 = 2000 iterations. And for the warmup factor, it can be obtained by: 1. / 2000.

There are several key points about how to modify the settings:

Batch size and learning rate. The default settings are set for 8 GPUs with a total batch size of 64 (8 images per GPU). You have 3 GPUs, and set the batch size as 48 (16 images per GPU). Thus, according to the scaling rule, your learning rate should be 0.12 * 48 / 64 = 0.09.

Training iterations and learning rate steps. We training a maximum iteration of 22500 for batch size 64, for batch size 48, you should also modify the maximum iteration from 22500 to 22500 64 / 48 = 30000. For the learning rate steps, they also should be re-calculated according to the rule, [15000 64 / 48, 20000 * 64 / 48]

The warmup iterations and the warmup factor. For batch size 64, we warm up the training for 1500 iterations. Thus, for batch size 48, you can modify it from 1500 iterations to 1500 * 64 / 48 = 2000 iterations. And for the warmup factor, it can be obtained by: 1. / 2000.

Is this calculation valid for every data set? Are we going to do the same calculation for datasets of different sizes? @chensnathan

chensnathan / YOLOF

gap in map #19