akhilpm / DroneDetectron2

Pytorch code for our CVPRw 2023 paper "Cascaded Zoom-in Detector for High Resolution Aerial Images"
MIT License
52 stars 7 forks source link

result #22

Closed zuikeaideren closed 1 year ago

zuikeaideren commented 1 year ago

Hello, I downloaded the R-50.pkl pre-training weight (red box) from MODEL_ZOO.md of detectron2, my Base-RCNN-FPN.yaml, RCNN-FPN-CROP.yaml, and the experimental results are as follows, 1694054052987

Base-RCNN-FPN: MODEL: META_ARCHITECTURE: "CropRCNN" WEIGHTS: "/root/data/yhj/detectron2/pretrained_models/R-50.pkl" BACKBONE: NAME: "build_retinanet_resnet_fpn_backbone" RESNETS: OUT_FEATURES: ["res3", "res4", "res5"] DEPTH: 50 FPN: IN_FEATURES: ["res3", "res4", "res5"] ANCHOR_GENERATOR: SIZES: [[32], [64], [128], [256], [512]] # One size for each in feature map ASPECT_RATIOS: [[0.5, 1.0, 2.0]] # Three aspect ratios (same for all in feature maps) RPN: IN_FEATURES: ["p3", "p4", "p5", "p6", "p7"] PRE_NMS_TOPK_TRAIN: 2000 # Per FPN level PRE_NMS_TOPK_TEST: 1000 # Per FPN level

Detectron1 uses 2000 proposals per-batch,

# (See "modeling/rpn/rpn_outputs.py" for details of this legacy issue)
# which is approximately 1000 proposals per-image since the default batch size for FPN is 2.
POST_NMS_TOPK_TRAIN: 1000
POST_NMS_TOPK_TEST: 1000

ROI_HEADS: NAME: "StandardROIHeads" IN_FEATURES: ["p3", "p4", "p5", "p6", "p7"] NUM_CLASSES: 10 SCORE_THRESH_TEST: 0.001 NMS_THRESH_TEST: 0.5 IOU_THRESHOLDS: [0.5] ROI_BOX_HEAD: NAME: "FastRCNNConvFCHead" NUM_FC: 2 POOLER_RESOLUTION: 7 ROI_MASK_HEAD: NAME: "MaskRCNNConvUpsampleHead" NUM_CONV: 4 POOLER_RESOLUTION: 14

DATASETS: TRAIN: ("visdrone_2019_train",) TEST: ("visdrone_2019_val",) DATALOADER: NUM_WORKERS: 2 SOLVER: IMS_PER_BATCH: 8 BASE_LR: 0.01 STEPS: (50000, 70000) MAX_ITER: 90000 CHECKPOINT_PERIOD: 3000 CLIP_GRADIENTS: ENABLED: True CLIP_TYPE: "norm" CLIP_VALUE: 35.0
INPUT: MIN_SIZE_TRAIN: (800, 900, 1000, 1100, 1200) MAX_SIZE_TRAIN: 1999 MIN_SIZE_TEST: 1200 MAX_SIZE_TEST: 1999 VERSION: 2 TEST: EVAL_PERIOD: 3000 DETECTIONS_PER_IMAGE: 800 CROPTRAIN: USE_CROPS: True

RCNN-FPN-CROP.yaml MODEL: META_ARCHITECTURE: "CropRCNN" WEIGHTS: "/root/data/yhj/detectron2/pretrained_models/R-50.pkl" BACKBONE: NAME: "build_retinanet_resnet_fpn_backbone" RESNETS: OUT_FEATURES: ["res3", "res4", "res5"] DEPTH: 50 FPN: IN_FEATURES: ["res3", "res4", "res5"] ANCHOR_GENERATOR: SIZES: [[32], [64], [128], [256], [512]] # One size for each in feature map ASPECT_RATIOS: [[0.5, 1.0, 2.0]] # Three aspect ratios (same for all in feature maps) RPN: IN_FEATURES: ["p3", "p4", "p5", "p6", "p7"] PRE_NMS_TOPK_TRAIN: 2000 # Per FPN level PRE_NMS_TOPK_TEST: 1000 # Per FPN level

Detectron1 uses 2000 proposals per-batch,

# (See "modeling/rpn/rpn_outputs.py" for details of this legacy issue)
# which is approximately 1000 proposals per-image since the default batch size for FPN is 2.
POST_NMS_TOPK_TRAIN: 1000
POST_NMS_TOPK_TEST: 1000

ROI_HEADS: NAME: "StandardROIHeads" IN_FEATURES: ["p3", "p4", "p5", "p6", "p7"] NUM_CLASSES: 10 SCORE_THRESH_TEST: 0.001 NMS_THRESH_TEST: 0.5 IOU_THRESHOLDS: [0.5] ROI_BOX_HEAD: NAME: "FastRCNNConvFCHead" NUM_FC: 2 POOLER_RESOLUTION: 7 ROI_MASK_HEAD: NAME: "MaskRCNNConvUpsampleHead" NUM_CONV: 4 POOLER_RESOLUTION: 14

DATASETS: TRAIN: ("visdrone_2019_train",) TEST: ("visdrone_2019_val",) DATALOADER: NUM_WORKERS: 2 SOLVER: IMS_PER_BATCH: 8 BASE_LR: 0.01 STEPS: (50000, 70000) MAX_ITER: 90000 CHECKPOINT_PERIOD: 3000 CLIP_GRADIENTS: ENABLED: True CLIP_TYPE: "norm" CLIP_VALUE: 35.0
INPUT: MIN_SIZE_TRAIN: (800, 900, 1000, 1100, 1200) MAX_SIZE_TRAIN: 1999 MIN_SIZE_TEST: 1200 MAX_SIZE_TEST: 1999 VERSION: 2 TEST: EVAL_PERIOD: 3000 DETECTIONS_PER_IMAGE: 800 CROPTRAIN: USE_CROPS: True

experimental results: 1694053748800

My experimental results are still about 2 different from those given in the original paper. What is the reason for this?