Keypoint training accuracy

I was able to train the e2e ResNet101 model for key point and bounding box detection using COCO 2014 data, but the model accuracy is abysmal compared to what was achieved in the model zoo.

I used 2 GPUs and included my config below. How can I achieve the results produced in the model zoo. Do I need more iterations? I also read over the paper for Detectron and the authors do a lot of additional steps to increase the accuracy. Is there anyway to incorporate these?

-Test-time augmentation -Train-time augmentation -Data distillation -Using ensembles

MODEL: TYPE: generalized_rcnn CONV_BODY: FPN.add_fpn_ResNet101_conv5_body NUM_CLASSES: 2 FASTER_RCNN: True KEYPOINTS_ON: True NUM_GPUS: 2 SOLVER: WEIGHT_DECAY: 0.0001 LR_POLICY: steps_with_decay BASE_LR: 0.005 GAMMA: 0.1 MAX_ITER: 30000 STEPS: [0, 15000, 20000] FPN: FPN_ON: True MULTILEVEL_ROIS: True MULTILEVEL_RPN: True FAST_RCNN: ROI_BOX_HEAD: head_builder.add_roi_2mlp_head ROI_XFORM_METHOD: RoIAlign ROI_XFORM_RESOLUTION: 7 ROI_XFORM_SAMPLING_RATIO: 2 KRCNN: ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX NUM_STACKED_CONVS: 8 NUM_KEYPOINTS: 17 USE_DECONV_OUTPUT: True CONV_INIT: MSRAFill CONV_HEAD_DIM: 512 UP_SCALE: 2 HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) UP_SCALE (2) USE_DECONV_OUTPUT (2) ROI_XFORM_METHOD: RoIAlign ROI_XFORM_RESOLUTION: 14 ROI_XFORM_SAMPLING_RATIO: 2 KEYPOINT_CONFIDENCE: bbox TRAIN: WEIGHTS: https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-101.pkl DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival') SCALES: (640, 672, 704, 736, 768, 800) MAX_SIZE: 1333 BATCH_SIZE_PER_IM: 512 RPN_PRE_NMS_TOP_N: 2000 # Per FPN level TEST: DATASETS: ('keypoints_coco_2014_minival',) SCALE: 800 MAX_SIZE: 1333 NMS: 0.5 RPN_PRE_NMS_TOP_N: 1000 # Per FPN level RPN_POST_NMS_TOP_N: 1000 OUTPUT_DIR: .

facebookresearch / Detectron

Keypoint training accuracy #407