facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.57k stars 7.49k forks source link

unable to reproduce the results in detectron2 densepose model zoo #1883

Closed cool-xuan closed 4 years ago

cool-xuan commented 4 years ago

If you do not know the root cause of the problem, and wish someone to help you, please post according to this template:

Instructions To Reproduce the Issue:

Check https://stackoverflow.com/help/minimal-reproducible-example for how to ask good questions. Simplify the steps to reproduce the issue using suggestions from the above link, and provide them below:

  1. full code you wrote or full changes you made (git diff)
    
    _BASE_: "Base-DensePose-RCNN-FPN.yaml"
    MODEL:
    WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
    RESNETS:
    DEPTH: 50
    SOLVER:
    MAX_ITER: 130000
    STEPS: (100000, 120000)
    IMS_PER_BATCH: 15

OUTPUT_DIR: "/raid/zyx/detectron2_torch1.5/output/densepose_rcnn_R_50_FPN_s1x" VERSION: 2

2. what exact command you run:
I use 3 V100 GPUs to train the model.
command: CUDA_VISIBLE_DEVICES=0,1,2 python train_net.py --config-file configs/densepose_rcnn_R_50_FPN_s1x.yaml --num-gpus 3
3. __full logs__ you observed:

full config: BOOTSTRAP_DATASETS: [] BOOTSTRAP_MODEL: DEVICE: cuda WEIGHTS: CUDNN_BENCHMARK: False DATALOADER: ASPECT_RATIO_GROUPING: True FILTER_EMPTY_ANNOTATIONS: True NUM_WORKERS: 4 REPEAT_THRESHOLD: 0.0 SAMPLER_TRAIN: TrainingSampler DATASETS: CATEGORY_MAPS:

PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000 PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000 PROPOSAL_FILES_TEST: () PROPOSAL_FILES_TRAIN: () TEST: ('densepose_coco_2014_minival',) TRAIN: ('densepose_coco_2014_train', 'densepose_coco_2014_valminusminival') WHITELISTED_CATEGORIES:

GLOBAL: HACK: 1.0 INPUT: CROP: ENABLED: False SIZE: [0.9, 0.9] TYPE: relative_range FORMAT: BGR MASK_FORMAT: polygon MAX_SIZE_TEST: 1333 MAX_SIZE_TRAIN: 1333 MIN_SIZE_TEST: 800 MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800) MIN_SIZE_TRAIN_SAMPLING: choice ROTATION_ANGLES: [0] MODEL: ANCHOR_GENERATOR: ANGLES: [[-90, 0, 90]] ASPECT_RATIOS: [[0.5, 1.0, 2.0]] NAME: DefaultAnchorGenerator OFFSET: 0.0 SIZES: [[32], [64], [128], [256], [512]] BACKBONE: FREEZE_AT: 2 NAME: build_resnet_fpn_backbone DENSEPOSE_ON: True DEVICE: cuda FPN: FUSE_TYPE: sum IN_FEATURES: ['res2', 'res3', 'res4', 'res5'] NORM: OUT_CHANNELS: 256 HRNET: HRFPN: OUT_CHANNELS: 256 STAGE2: BLOCK: BASIC FUSE_METHOD: SUM NUM_BLOCKS: [4, 4] NUM_BRANCHES: 2 NUM_CHANNELS: [32, 64] NUM_MODULES: 1 STAGE3: BLOCK: BASIC FUSE_METHOD: SUM NUM_BLOCKS: [4, 4, 4] NUM_BRANCHES: 3 NUM_CHANNELS: [32, 64, 128] NUM_MODULES: 4 STAGE4: BLOCK: BASIC FUSE_METHOD: SUM NUM_BLOCKS: [4, 4, 4, 4] NUM_BRANCHES: 4 NUM_CHANNELS: [32, 64, 128, 256] NUM_MODULES: 3 STEM_INPLANES: 64 KEYPOINT_ON: False LOAD_PROPOSALS: False MASK_ON: False META_ARCHITECTURE: GeneralizedRCNN PANOPTIC_FPN: COMBINE: ENABLED: True INSTANCES_CONFIDENCE_THRESH: 0.5 OVERLAP_THRESH: 0.5 STUFF_AREA_LIMIT: 4096 INSTANCE_LOSS_WEIGHT: 1.0 PIXEL_MEAN: [103.53, 116.28, 123.675] PIXEL_STD: [1.0, 1.0, 1.0] PROPOSAL_GENERATOR: MIN_SIZE: 0 NAME: RPN RESNETS: DEFORM_MODULATED: False DEFORM_NUM_GROUPS: 1 DEFORM_ON_PER_STAGE: [False, False, False, False] DEPTH: 50 NORM: FrozenBN NUM_GROUPS: 1 OUT_FEATURES: ['res2', 'res3', 'res4', 'res5'] RES2_OUT_CHANNELS: 256 RES5_DILATION: 1 STEM_OUT_CHANNELS: 64 STRIDE_IN_1X1: True WIDTH_PER_GROUP: 64 RETINANET: BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0) FOCAL_LOSS_ALPHA: 0.25 FOCAL_LOSS_GAMMA: 2.0 IN_FEATURES: ['p3', 'p4', 'p5', 'p6', 'p7'] IOU_LABELS: [0, -1, 1] IOU_THRESHOLDS: [0.4, 0.5] NMS_THRESH_TEST: 0.5 NUM_CLASSES: 80 NUM_CONVS: 4 PRIOR_PROB: 0.01 SCORE_THRESH_TEST: 0.05 SMOOTH_L1_LOSS_BETA: 0.1 TOPK_CANDIDATES_TEST: 1000 ROI_BOX_CASCADE_HEAD: BBOX_REG_WEIGHTS: ((10.0, 10.0, 5.0, 5.0), (20.0, 20.0, 10.0, 10.0), (30.0, 30.0, 15.0, 15.0)) IOUS: (0.5, 0.6, 0.7) ROI_BOX_HEAD: BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_LOSS_WEIGHT: 1.0 BBOX_REG_WEIGHTS: (10.0, 10.0, 5.0, 5.0) CLS_AGNOSTIC_BBOX_REG: False CONV_DIM: 256 FC_DIM: 1024 NAME: FastRCNNConvFCHead NORM: NUM_CONV: 0 NUM_FC: 2 POOLER_RESOLUTION: 7 POOLER_SAMPLING_RATIO: 2 POOLER_TYPE: ROIAlign SMOOTH_L1_BETA: 0.0 TRAIN_ON_PRED_BOXES: False ROI_DENSEPOSE_HEAD: COARSE_SEGM_TRAINED_BY_MASKS: False CONV_HEAD_DIM: 512 CONV_HEAD_KERNEL: 3 DECODER_COMMON_STRIDE: 4 DECODER_CONV_DIMS: 256 DECODER_NORM: DECODER_NUM_CLASSES: 256 DECODER_ON: True DECONV_KERNEL: 4 DEEPLAB: NONLOCAL_ON: 0 NORM: GN FG_IOU_THRESHOLD: 0.7 HEATMAP_SIZE: 112 INDEX_WEIGHTS: 5.0 NAME: DensePoseV1ConvXHead NUM_COARSE_SEGM_CHANNELS: 2 NUM_PATCHES: 24 NUM_STACKED_CONVS: 8 PART_WEIGHTS: 1.0 POINT_REGRESSION_WEIGHTS: 0.01 POOLER_RESOLUTION: 28 POOLER_SAMPLING_RATIO: 2 POOLER_TYPE: ROIAlign SEGM_CONFIDENCE: ENABLED: False EPSILON: 0.01 UP_SCALE: 2 UV_CONFIDENCE: ENABLED: False EPSILON: 0.01 TYPE: iid_iso ROI_HEADS: BATCH_SIZE_PER_IMAGE: 512 IN_FEATURES: ['p2', 'p3', 'p4', 'p5'] IOU_LABELS: [0, 1] IOU_THRESHOLDS: [0.5] NAME: DensePoseROIHeads NMS_THRESH_TEST: 0.5 NUM_CLASSES: 1 POSITIVE_FRACTION: 0.25 PROPOSAL_APPEND_GT: True SCORE_THRESH_TEST: 0.05 ROI_KEYPOINT_HEAD: CONV_DIMS: (512, 512, 512, 512, 512, 512, 512, 512) LOSS_WEIGHT: 1.0 MIN_KEYPOINTS_PER_IMAGE: 1 NAME: KRCNNConvDeconvUpsampleHead NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: True NUM_KEYPOINTS: 17 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 ROI_MASK_HEAD: CLS_AGNOSTIC_MASK: False CONV_DIM: 256 NAME: MaskRCNNConvUpsampleHead NORM: NUM_CONV: 0 POOLER_RESOLUTION: 14 POOLER_SAMPLING_RATIO: 0 POOLER_TYPE: ROIAlignV2 RPN: BATCH_SIZE_PER_IMAGE: 256 BBOX_REG_LOSS_TYPE: smooth_l1 BBOX_REG_LOSS_WEIGHT: 1.0 BBOX_REG_WEIGHTS: (1.0, 1.0, 1.0, 1.0) BOUNDARY_THRESH: -1 HEAD_NAME: StandardRPNHead IN_FEATURES: ['p2', 'p3', 'p4', 'p5', 'p6'] IOU_LABELS: [0, -1, 1] IOU_THRESHOLDS: [0.3, 0.7] LOSS_WEIGHT: 1.0 NMS_THRESH: 0.7 POSITIVE_FRACTION: 0.5 POST_NMS_TOPK_TEST: 1000 POST_NMS_TOPK_TRAIN: 1000 PRE_NMS_TOPK_TEST: 1000 PRE_NMS_TOPK_TRAIN: 2000 SMOOTH_L1_BETA: 0.0 SEM_SEG_HEAD: COMMON_STRIDE: 4 CONVS_DIM: 128 IGNORE_VALUE: 255 IN_FEATURES: ['p2', 'p3', 'p4', 'p5'] LOSS_WEIGHT: 1.0 NAME: SemSegFPNHead NORM: GN NUM_CLASSES: 54 WEIGHTS: pretrained_weights/ImageNetPretrained/R-50.pkl OUTPUT_DIR: /raid/zyx/detectron2/output/densepose_rcnn_R_50_FPN_s1x SEED: -1 SOLVER: BASE_LR: 0.01 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 10000 CLIP_GRADIENTS: CLIP_TYPE: value CLIP_VALUE: 1.0 ENABLED: False NORM_TYPE: 2.0 GAMMA: 0.1 IMS_PER_BATCH: 18 LR_SCHEDULER_NAME: WarmupMultiStepLR MAX_ITER: 130000 MOMENTUM: 0.9 NESTEROV: False REFERENCE_WORLD_SIZE: 0 STEPS: (100000, 120000) WARMUP_FACTOR: 0.1 WARMUP_ITERS: 1000 WARMUP_METHOD: linear WEIGHT_DECAY: 0.0001 WEIGHT_DECAY_BIAS: 0.0001 WEIGHT_DECAY_NORM: 0.0 TEST: AUG: ENABLED: False FLIP: True MAX_SIZE: 4000 MIN_SIZES: (400, 500, 600, 700, 800, 900, 1000, 1100, 1200) ROTATION_ANGLES: () DETECTIONS_PER_IMAGE: 100 EVAL_PERIOD: 0 EXPECTED_RESULTS: [] KEYPOINT_OKS_SIGMAS: [] PRECISE_BN: ENABLED: False NUM_ITER: 200 VERSION: 2 VIS_PERIOD: 0


## Expected behavior:

If there are no obvious error in "what you observed" provided above,
please tell us the expected behavior.

If you expect the model to converge / work better, note that we do not give suggestions
on how to train a new model.
Only in one of the two conditions we will help with it:
(1) You're unable to reproduce the results in detectron2 model zoo.
(2) It indicates a detectron2 bug.

## Environment:

Provide your environment information using the following command:

sys.platform linux Python 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0] numpy 1.19.1 detectron2 0.2.1 @/home/zhouyixuan/detectron2_torch1.5/detectron2 Compiler GCC 7.4 CUDA compiler CUDA 10.1 detectron2 arch flags sm_70 DETECTRON2_ENV_MODULE PyTorch 1.5.0 @/home/zhouyixuan/anaconda3/envs/detectron2_torch1.5/lib/python3.7/site-packages/torch PyTorch debug build False GPU available True GPU 0,1,2,3,4,5,6,7 Tesla V100-SXM2-32GB CUDA_HOME /home/zhouyixuan/cuda-10.1 Pillow 7.2.0 torchvision 0.6.0a0+82fd1c8 @/home/zhouyixuan/anaconda3/envs/detectron2_torch1.5/lib/python3.7/site-packages/torchvision torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75 fvcore 0.1.1.post20200716 cv2 4.3.0


PyTorch built with:

When I train the densepose model just with your provided config(batchsize changed 16 to 15) , I get a pretty low precision.

[08/07 09:18:21] d2.evaluation.coco_evaluation INFO: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 57.107 | 83.506 | 61.604 | 27.687 | 53.988 | 72.068 |
[08/07 09:24:30] densepose.evaluator INFO: Evaluation results for densepose, GPS metric: 
|   AP   |  AP50  |  AP75  |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|
| 53.454 | 87.022 | 56.642 | 51.664 | 54.903 |

please give me some suggestions, I am really confused

vkhalidov commented 4 years ago

@cool-xuan indeed, 10 AP is quite a big difference. Could you please try to compensate for the smaller batch size and increase the total number of iterations to see whether the results get improved? You run about 3 epochs less than the standard schedule. The results in the model zoo are expected to be reproducible with standard deviation in DP AP GPSm metric being about 0.1.

cool-xuan commented 4 years ago

Thanks for your advice. I 'll set the total number of iterations larger and decrease learning rate as you said. Could you tell me how many epoches are needed to be reproducible

vkhalidov commented 4 years ago

@cool-xuan it's hard to give an exact recipe on how to adjust your training schedule to match the one from the model zoo. In terms of pure image counts, you've got 130000 iterations with 15 images per batch, it's 130000 images less than the standard schedule. So you need to add 8667 iterations to match in terms of the amount of data. Then you'll also need to adjust the schedule, notably the learning rate (SOLVER. BASE_LR) and warmup factor (SOLVER. WARMUP_FACTOR).

cool-xuan commented 4 years ago

Do you train the model in model zoo just use the provided config : MAX_ITER: 130000 STEPS: (100000, 120000)

I don't think the results can be reproducible just after adding 8667 iterations.

Are there some other suggestions about my problem? I just copy the conda env from one server to my server, because the download speed of my server is very very slow. Dose this cause some problem?

vkhalidov commented 4 years ago

@cool-xuan yes, I train the model using 1 machine with 8 GPUs with the exact config from the model zoo. Changing the batch size requires readjusting the training schedule (e.g. the learning rate curve). Copying conda env should not be an issue

vkhalidov commented 4 years ago
@cool-xuan I tried relaunching the training on my side (8 GPUs, 16 images per batch) and have got results similar to yours: bbox AP dp AP GPS dp AP GPSm
58.6741 54.7701 58.4207

This is obviously too low and unexpected. I'm going to investigate and post here an update. Thank you for flagging the issue!

cool-xuan commented 4 years ago

@vkhalidov Thank you for your relaunching. I am a beginer in DensePose and not very familiar with detectron2. But I'm trying to find the bug too. Waiting for your update. It is really a startling and messive work. Thanks a lot again.

vkhalidov commented 4 years ago

@cool-xuan while I'm investigating the issue, you can use aef142769953de9b8c117138d15e146633949ea2 for which the scores correspond to the ones reported in the model zoo

vkhalidov commented 4 years ago

@cool-xuan 4921a51f10ca196fb9741e91878da4ccb20d511f should have fixed the issue, all baselines from the model zoo should now be reproducible

cool-xuan commented 4 years ago

@vkhalidov Thanks again for your work.