facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.23k stars 5.45k forks source link

How to conduct multi-task learning #839

Closed lhh17 closed 5 years ago

lhh17 commented 5 years ago

In your paper, adding a segment branch to keypoints-rcnn improves the AP of keypoints from 64.2 to 64.7. However, I didn't see any improvement after simultaneously training the mask branch and the keypoints branch. The results of my experiments are as following:
keypoint-only : AP(bbox) 54.0 AP(keypoint) 64.3 keypoint&mask : AP(bbox) 54.5 AP(keypoint) 64.3

The config file used for mutli-task learning : MODEL: TYPE: generalized_rcnn CONV_BODY: FPN.add_fpn_ResNet50_conv5_body NUM_CLASSES: 2 FASTER_RCNN: True KEYPOINTS_ON: True NUM_GPUS: 8 SOLVER: WEIGHT_DECAY: 0.0001 LR_POLICY: steps_with_decay BASE_LR: 0.02 GAMMA: 0.1 MAX_ITER: 90000 STEPS: [0, 60000, 80000] FPN: FPN_ON: True MULTILEVEL_ROIS: True MULTILEVEL_RPN: True FAST_RCNN: ROI_BOX_HEAD: head_builder.add_roi_2mlp_head ROI_XFORM_METHOD: RoIAlign ROI_XFORM_RESOLUTION: 7 ROI_XFORM_SAMPLING_RATIO: 2 KRCNN: ROI_KEYPOINTS_HEAD: keypoint_rcnn_heads.add_roi_pose_head_v1convX NUM_STACKED_CONVS: 8 NUM_KEYPOINTS: 17 USE_DECONV_OUTPUT: True CONV_INIT: MSRAFill CONV_HEAD_DIM: 512 UP_SCALE: 2 HEATMAP_SIZE: 56 # ROI_XFORM_RESOLUTION (14) UP_SCALE (2) USE_DECONV_OUTPUT (2) ROI_XFORM_METHOD: RoIAlign ROI_XFORM_RESOLUTION: 14 ROI_XFORM_SAMPLING_RATIO: 2 KEYPOINT_CONFIDENCE: bbox MRCNN: ROI_MASK_HEAD: mask_rcnn_heads.mask_rcnn_fcn_head_v1up4convs RESOLUTION: 28 # (output mask resolution) default 14 ROI_XFORM_METHOD: RoIAlign ROI_XFORM_RESOLUTION: 14 # default 7 ROI_XFORM_SAMPLING_RATIO: 2 # default 0 DILATION: 1 # default 2 CONV_INIT: MSRAFill # default GaussianFill TRAIN: WEIGHTS: https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/MSRA/R-50.pkl DATASETS: ('keypoints_coco_2014_train', 'keypoints_coco_2014_valminusminival') SCALES: (640, 672, 704, 736, 768, 800) MAX_SIZE: 1333 BATCH_SIZE_PER_IM: 512 RPN_PRE_NMS_TOP_N: 2000 # Per FPN level TEST: DATASETS: ('keypoints_coco_2014_minival',) SCALE: 800 MAX_SIZE: 1333 NMS: 0.5 RPN_PRE_NMS_TOP_N: 1000 # Per FPN level RPN_POST_NMS_TOP_N: 1000 OUTPUT_DIR: .

Could you give more detail about the multi-task training? It would be very nice of you if you could provide a multi-task learning config file.

ir413 commented 5 years ago

Hi @lhh17, please note that the config you posted uses end-to-end training. Nevertheless, you should be able to reproduce the same trend as reported for 2-stage training the paper. Here is the config I used to do that at some point in the past (box AP: 53.6 -> 54.3, mask AP: 45.4, kps AP: 64.2 -> 64.6). I also added two potential modifications in comments that you may want to experiment with to improve the keypoint results further: (1) longer warmup, and (2) keypoint multi-task loss weighing.