IDEA-Research / MaskDINO

[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"
Apache License 2.0
1.18k stars 105 forks source link

large gap for reproducing the semantic segmentation results #89

Open ZhengyuXia opened 1 year ago

ZhengyuXia commented 1 year ago

Hi, thanks for the excellent work. I'm trying to reproduce the semantic segmentation results (ResNet-50 backbone + ADE20K). However, the performance is 46.6%, which is much lower than yours by 2.1%. I've conducted the experiments three times, all the performance were around 46.6%.

The config file that I used is _configs/ade20k/semantic-segmentation/maskdino_R50_bs16_160ksteplr.yaml, which indicates the training iteration is 160K, same as mentioned in the paper. However, the model we can download here is _maskdino_r50_50ep_100q_celoss_hid1024_3s_semantic_ade20k48.7miou.pth, which indicates the training epoch is 50. It seems the training config files are different between them. Therefore, I'm wondering if there was something I missed when training this model?

hhaAndroid commented 1 year ago

@ZhengyuXia I am

[08/02 08:49:23 d2.engine.defaults]: Evaluation results for ade20k_sem_seg_val in csv format:
[08/02 08:49:23 d2.evaluation.testing]: copypaste: Task: sem_seg
[08/02 08:49:23 d2.evaluation.testing]: copypaste: mIoU,fwIoU,mACC,pACC
[08/02 08:49:23 d2.evaluation.testing]: copypaste: 45.5368,70.6117,59.3918,81.6061

I have run it three times, and the results are all similar.

ZhengyuXia commented 1 year ago

@hhaAndroid

I rollback the python version from 3.8 to 3.7, and the performance increased by ~0.4% mIoU. I also enabled the "SyncBN" in the config file and it gives additional ~0.5% mIoU improvement. So far, my best reproduction result is 47.6%, but it is still lower than the paper's result by ~1%.

FengLi-ust commented 1 year ago

ade_48.7log.txt Hi, above is my log file for the 48.7 results for your reference. ADE20K is a small dataset, and the performance may not be so stable. I will also check the code to see if something gets wrong.

ZhengyuXia commented 1 year ago

@FengLi-ust

Thanks for uploading the log file. I roughly checked the settings in this file, and found several difference.

  1. NORM in ResNet,

The NORM setting in the log file is FrozenBN

  RESNETS:
    DEFORM_MODULATED: False
    DEFORM_NUM_GROUPS: 1
    DEFORM_ON_PER_STAGE: [False, False, False, False]
    DEPTH: 50
    NORM: FrozenBN

But it is disabled in the given yaml file

  RESNETS:
    DEPTH: 50
    STEM_TYPE: "basic"  # not used
    STEM_OUT_CHANNELS: 64
    STRIDE_IN_1X1: False
    OUT_FEATURES: ["res2", "res3", "res4", "res5"]
    # NORM: "SyncBN"
  1. The CLASS_WEIGHT and DEC layer are different

The CLASS_WEIGHT is 2.0 and DEC layer is 10 in the log file

  MASK_FORMER:
    BOX_LOSS: True
    BOX_WEIGHT: 5.0
    CLASS_WEIGHT: 2.0
    DEC_LAYERS: 10

But the CLASS_WEIGHT is 4.0 and DEC layer is 9 in the yaml file

  MaskDINO:
    TRANSFORMER_DECODER_NAME: "MaskDINODecoder"
    DEEP_SUPERVISION: True
    NO_OBJECT_WEIGHT: 0.1
    CLASS_WEIGHT: 4.0
    MASK_WEIGHT: 5.0
    DICE_WEIGHT: 5.0
    HIDDEN_DIM: 256
    NUM_OBJECT_QUERIES: 100
    NHEADS: 8
    DROPOUT: 0.0
    DIM_FEEDFORWARD: 2048
    ENC_LAYERS: 0
    PRE_NORM: False
    ENFORCE_INPUT_PROJ: False
    SIZE_DIVISIBILITY: 32
    DEC_LAYERS: 9  # 9 decoder layers, add one for the loss on learnable query

I tried to use all of or some of these settings, but the best performance is ~47.1% mIoU, still lower by ~1.6%