IDEA-Research / MaskDINO

[CVPR 2023] Official implementation of the paper "Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation"
Apache License 2.0
1.19k stars 105 forks source link

SwinL config for ADE20K semantic segmentation. #77

Open peiwang062 opened 1 year ago

peiwang062 commented 1 year ago

Dear authors,

Is it possible to publish the config for swin-L & ADE20K semantic segmentation? To reproduce the results at the bottom of Table 7.

huxycn commented 1 year ago

maskdino/configs/ade20k/semantic-segmentation/swin/maskdino_R50_bs16_160k_steplr.yaml `BASE: ../Base-ADE20K-SemanticSegmentation.yaml MODEL: META_ARCHITECTURE: "MaskDINO" BACKBONE: NAME: "D2SwinTransformer" SWIN: EMBED_DIM: 192 DEPTHS: [ 2, 2, 18, 2 ] NUM_HEADS: [ 6, 12, 24, 48 ] WINDOW_SIZE: 12 APE: False DROP_PATH_RATE: 0.3 PATCH_NORM: True PRETRAIN_IMG_SIZE: 384 WEIGHTS: "swin_large_patch4_window12_384_22k.pkl" PIXEL_MEAN: [ 123.675, 116.280, 103.530 ] PIXEL_STD: [ 58.395, 57.120, 57.375 ] SEM_SEG_HEAD: NAME: "MaskDINOHead" IGNORE_VALUE: 255 NUM_CLASSES: 150 LOSS_WEIGHT: 1.0 CONVS_DIM: 256 MASK_DIM: 256 NORM: "GN"

pixel decoder

PIXEL_DECODER_NAME: "MaskDINOEncoder"
DIM_FEEDFORWARD: 1024
NUM_FEATURE_LEVELS: 4
TOTAL_NUM_FEATURE_LEVELS: 5
IN_FEATURES: ["res2", "res3", "res4", "res5"]
DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: ["res2", "res3", "res4", "res5"]
COMMON_STRIDE: 4
TRANSFORMER_ENC_LAYERS: 6

MaskDINO: TRANSFORMER_DECODER_NAME: "MaskDINODecoder" DEEP_SUPERVISION: True NO_OBJECT_WEIGHT: 0.1 CLASS_WEIGHT: 4.0 MASK_WEIGHT: 5.0 DICE_WEIGHT: 5.0 HIDDEN_DIM: 256 NUM_OBJECT_QUERIES: 100 NHEADS: 8 DROPOUT: 0.0 DIM_FEEDFORWARD: 2048 ENC_LAYERS: 0 PRE_NORM: False ENFORCE_INPUT_PROJ: False SIZE_DIVISIBILITY: 32 DEC_LAYERS: 9 # 9 decoder layers, add one for the loss on learnable query TRAIN_NUM_POINTS: 12544 OVERSAMPLE_RATIO: 3.0 IMPORTANCE_SAMPLE_RATIO: 0.75 TWO_STAGE: False DN: "seg" DN_NUM: 100 INITIALIZE_BOX_TYPE: "no" SEMANTIC_CE_LOSS: True TEST: SEMANTIC_ON: True INSTANCE_ON: False PANOPTIC_ON: False OVERLAP_THRESHOLD: 0.8 OBJECT_MASK_THRESHOLD: 0.8 SOLVER: AMP: ENABLED: False BACKBONE_MULTIPLIER: 0.1 BASE_LR: 0.0001 BASE_LR_END: 0.0 BIAS_LR_FACTOR: 1.0 CHECKPOINT_PERIOD: 5000

IMS_PER_BATCH: 8 LR_SCHEDULER_NAME: WarmupMultiStepLR MAX_ITER: 320000

STEPS: (270000,300000) WARMUP_FACTOR: 1.0 WARMUP_ITERS: 10 WARMUP_METHOD: linear`

peiwang062 commented 1 year ago

Thanks for the config. Just one quick question. Why two-stage was disenabled for semantic segmentation?