Hello, I downloaded the R-50.pkl pre-training weight (red box) from MODEL_ZOO.md of detectron2, my Base-RCNN-FPN.yaml, RCNN-FPN-CROP.yaml, and the experimental results are as follows,
Base-RCNN-FPN:
MODEL:
META_ARCHITECTURE: "CropRCNN"
WEIGHTS: "/root/data/yhj/detectron2/pretrained_models/R-50.pkl"
BACKBONE:
NAME: "build_retinanet_resnet_fpn_backbone"
RESNETS:
OUT_FEATURES: ["res3", "res4", "res5"]
DEPTH: 50
FPN:
IN_FEATURES: ["res3", "res4", "res5"]
ANCHOR_GENERATOR:
SIZES: [[32], [64], [128], [256], [512]] # One size for each in feature map
ASPECT_RATIOS: [[0.5, 1.0, 2.0]] # Three aspect ratios (same for all in feature maps)
RPN:
IN_FEATURES: ["p3", "p4", "p5", "p6", "p7"]
PRE_NMS_TOPK_TRAIN: 2000 # Per FPN level
PRE_NMS_TOPK_TEST: 1000 # Per FPN level
Detectron1 uses 2000 proposals per-batch,
# (See "modeling/rpn/rpn_outputs.py" for details of this legacy issue)
# which is approximately 1000 proposals per-image since the default batch size for FPN is 2.
POST_NMS_TOPK_TRAIN: 1000
POST_NMS_TOPK_TEST: 1000
RCNN-FPN-CROP.yaml
MODEL:
META_ARCHITECTURE: "CropRCNN"
WEIGHTS: "/root/data/yhj/detectron2/pretrained_models/R-50.pkl"
BACKBONE:
NAME: "build_retinanet_resnet_fpn_backbone"
RESNETS:
OUT_FEATURES: ["res3", "res4", "res5"]
DEPTH: 50
FPN:
IN_FEATURES: ["res3", "res4", "res5"]
ANCHOR_GENERATOR:
SIZES: [[32], [64], [128], [256], [512]] # One size for each in feature map
ASPECT_RATIOS: [[0.5, 1.0, 2.0]] # Three aspect ratios (same for all in feature maps)
RPN:
IN_FEATURES: ["p3", "p4", "p5", "p6", "p7"]
PRE_NMS_TOPK_TRAIN: 2000 # Per FPN level
PRE_NMS_TOPK_TEST: 1000 # Per FPN level
Detectron1 uses 2000 proposals per-batch,
# (See "modeling/rpn/rpn_outputs.py" for details of this legacy issue)
# which is approximately 1000 proposals per-image since the default batch size for FPN is 2.
POST_NMS_TOPK_TRAIN: 1000
POST_NMS_TOPK_TEST: 1000
Hello, I downloaded the R-50.pkl pre-training weight (red box) from MODEL_ZOO.md of detectron2, my Base-RCNN-FPN.yaml, RCNN-FPN-CROP.yaml, and the experimental results are as follows,
Base-RCNN-FPN: MODEL: META_ARCHITECTURE: "CropRCNN" WEIGHTS: "/root/data/yhj/detectron2/pretrained_models/R-50.pkl" BACKBONE: NAME: "build_retinanet_resnet_fpn_backbone" RESNETS: OUT_FEATURES: ["res3", "res4", "res5"] DEPTH: 50 FPN: IN_FEATURES: ["res3", "res4", "res5"] ANCHOR_GENERATOR: SIZES: [[32], [64], [128], [256], [512]] # One size for each in feature map ASPECT_RATIOS: [[0.5, 1.0, 2.0]] # Three aspect ratios (same for all in feature maps) RPN: IN_FEATURES: ["p3", "p4", "p5", "p6", "p7"] PRE_NMS_TOPK_TRAIN: 2000 # Per FPN level PRE_NMS_TOPK_TEST: 1000 # Per FPN level
Detectron1 uses 2000 proposals per-batch,
ROI_HEADS: NAME: "StandardROIHeads" IN_FEATURES: ["p3", "p4", "p5", "p6", "p7"] NUM_CLASSES: 10 SCORE_THRESH_TEST: 0.001 NMS_THRESH_TEST: 0.5 IOU_THRESHOLDS: [0.5] ROI_BOX_HEAD: NAME: "FastRCNNConvFCHead" NUM_FC: 2 POOLER_RESOLUTION: 7 ROI_MASK_HEAD: NAME: "MaskRCNNConvUpsampleHead" NUM_CONV: 4 POOLER_RESOLUTION: 14
DATASETS: TRAIN: ("visdrone_2019_train",) TEST: ("visdrone_2019_val",) DATALOADER: NUM_WORKERS: 2 SOLVER: IMS_PER_BATCH: 8 BASE_LR: 0.01 STEPS: (50000, 70000) MAX_ITER: 90000 CHECKPOINT_PERIOD: 3000 CLIP_GRADIENTS: ENABLED: True CLIP_TYPE: "norm" CLIP_VALUE: 35.0
INPUT: MIN_SIZE_TRAIN: (800, 900, 1000, 1100, 1200) MAX_SIZE_TRAIN: 1999 MIN_SIZE_TEST: 1200 MAX_SIZE_TEST: 1999 VERSION: 2 TEST: EVAL_PERIOD: 3000 DETECTIONS_PER_IMAGE: 800 CROPTRAIN: USE_CROPS: True
RCNN-FPN-CROP.yaml MODEL: META_ARCHITECTURE: "CropRCNN" WEIGHTS: "/root/data/yhj/detectron2/pretrained_models/R-50.pkl" BACKBONE: NAME: "build_retinanet_resnet_fpn_backbone" RESNETS: OUT_FEATURES: ["res3", "res4", "res5"] DEPTH: 50 FPN: IN_FEATURES: ["res3", "res4", "res5"] ANCHOR_GENERATOR: SIZES: [[32], [64], [128], [256], [512]] # One size for each in feature map ASPECT_RATIOS: [[0.5, 1.0, 2.0]] # Three aspect ratios (same for all in feature maps) RPN: IN_FEATURES: ["p3", "p4", "p5", "p6", "p7"] PRE_NMS_TOPK_TRAIN: 2000 # Per FPN level PRE_NMS_TOPK_TEST: 1000 # Per FPN level
Detectron1 uses 2000 proposals per-batch,
ROI_HEADS: NAME: "StandardROIHeads" IN_FEATURES: ["p3", "p4", "p5", "p6", "p7"] NUM_CLASSES: 10 SCORE_THRESH_TEST: 0.001 NMS_THRESH_TEST: 0.5 IOU_THRESHOLDS: [0.5] ROI_BOX_HEAD: NAME: "FastRCNNConvFCHead" NUM_FC: 2 POOLER_RESOLUTION: 7 ROI_MASK_HEAD: NAME: "MaskRCNNConvUpsampleHead" NUM_CONV: 4 POOLER_RESOLUTION: 14
DATASETS: TRAIN: ("visdrone_2019_train",) TEST: ("visdrone_2019_val",) DATALOADER: NUM_WORKERS: 2 SOLVER: IMS_PER_BATCH: 8 BASE_LR: 0.01 STEPS: (50000, 70000) MAX_ITER: 90000 CHECKPOINT_PERIOD: 3000 CLIP_GRADIENTS: ENABLED: True CLIP_TYPE: "norm" CLIP_VALUE: 35.0
INPUT: MIN_SIZE_TRAIN: (800, 900, 1000, 1100, 1200) MAX_SIZE_TRAIN: 1999 MIN_SIZE_TEST: 1200 MAX_SIZE_TEST: 1999 VERSION: 2 TEST: EVAL_PERIOD: 3000 DETECTIONS_PER_IMAGE: 800 CROPTRAIN: USE_CROPS: True
experimental results:
My experimental results are still about 2 different from those given in the original paper. What is the reason for this?