detectron2 trained mAP only 40.4

Here is my config, same:

MODEL:
  META_ARCHITECTURE: "Detr"
  WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
  PIXEL_MEAN: [123.675, 116.280, 103.530]
  PIXEL_STD: [58.395, 57.120, 57.375]
  MASK_ON: False
  RESNETS:
    DEPTH: 50
    STRIDE_IN_1X1: False
    OUT_FEATURES: ["res2", "res3", "res4", "res5"]
  DETR:
    GIOU_WEIGHT: 2.0
    L1_WEIGHT: 5.0
    NUM_OBJECT_QUERIES: 100
    ENC_LAYERS: 6
    DEC_LAYERS: 6
    HIDDEN_DIM: 256

DATASETS:
  TRAIN: ("coco_2017_train",)
  TEST: ("coco_2017_val",)

SOLVER:
  IMS_PER_BATCH: 56
  BASE_LR: 0.0001
  STEPS: (369600,)
  MAX_ITER: 554400
  WARMUP_FACTOR: 1.0
  WARMUP_ITERS: 10
  WEIGHT_DECAY: 0.0001
  OPTIMIZER: "ADAMW"
  BACKBONE_MULTIPLIER: 0.1
  CLIP_GRADIENTS:
    ENABLED: True
    CLIP_TYPE: "full_model"
    # CLIP_TYPE: "norm"
    CLIP_VALUE: 0.01
    NORM_TYPE: 2.0
INPUT:
  MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800)
  CROP:
    ENABLED: True
    TYPE: "absolute_range"
    SIZE: (384, 600)
  FORMAT: "RGB"
TEST:
  EVAL_PERIOD: 4000
DATALOADER:
  FILTER_EMPTY_ANNOTATIONS: False
  NUM_WORKERS: 2
VERSION: 2

OUTPUT_DIR: "output/coco_detr"

same with d2 settings, but the final result:

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.404
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.613
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.423
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.187
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.439
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.596
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.323
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.517
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.551
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.279
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.602
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.779
[10/25 14:25:27 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 40.376 | 61.295 | 42.253 | 18.724 | 43.920 | 59.636 |
[10/25 14:25:27 d2.evaluation.coco_evaluation]: Per-category bbox AP: 
| category      | AP     | category     | AP     | category       | AP     |
|:--------------|:-------|:-------------|:-------|:---------------|:-------|
| person        | 50.701 | bicycle      | 30.469 | car            | 37.387 |
| motorcycle    | 44.267 | airplane     | 67.917 | bus            | 65.528 |
| train         | 65.090 | truck        | 32.143 | boat           | 22.165 |
| traffic light | 19.022 | fire hydrant | 65.936 | stop sign      | 58.840 |
| parking meter | 42.180 | bench        | 26.143 | bird           | 30.890 |
| cat           | 73.216 | dog          | 68.355 | horse          | 60.755 |
| sheep         | 51.302 | cow          | 54.661 | elephant       | 64.619 |
| bear          | 72.397 | zebra        | 69.112 | giraffe        | 69.636 |
| backpack      | 12.881 | umbrella     | 38.726 | handbag        | 13.226 |
| tie           | 29.277 | suitcase     | 38.705 | frisbee        | 55.926 |
| skis          | 23.012 | snowboard    | 36.953 | sports ball    | 34.675 |
| kite          | 36.643 | baseball bat | 34.237 | baseball glove | 31.970 |
| skateboard    | 49.857 | surfboard    | 36.456 | tennis racket  | 46.072 |
| bottle        | 28.812 | wine glass   | 29.662 | cup            | 36.442 |
| fork          | 30.848 | knife        | 15.650 | spoon          | 16.332 |
| bowl          | 38.622 | banana       | 23.552 | apple          | 19.805 |
| sandwich      | 38.018 | orange       | 31.823 | broccoli       | 24.402 |
| carrot        | 17.321 | hot dog      | 39.350 | pizza          | 52.068 |
| donut         | 41.069 | cake         | 36.364 | chair          | 26.023 |
| couch         | 46.775 | potted plant | 25.253 | bed            | 48.952 |
| dining table  | 31.275 | toilet       | 63.526 | tv             | 55.941 |
| laptop        | 62.972 | mouse        | 53.099 | remote         | 24.020 |
| keyboard      | 50.445 | cell phone   | 26.526 | microwave      | 54.395 |
| oven          | 37.117 | toaster      | 33.269 | sink           | 36.691 |
| refrigerator  | 58.727 | book         | 9.994  | clock          | 45.220 |
| vase          | 32.496 | scissors     | 36.930 | teddy bear     | 49.291 |
| hair drier    | 18.775 | toothbrush   | 20.854 |                |        |
[10/25 14:25:27 d2.engine.defaults]: Evaluation results for coco_2017_val in csv format:
[10/25 14:25:27 d2.evaluation.testing]: copypaste: Task: bbox
[10/25 14:25:27 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[10/25 14:25:27 d2.evaluation.testing]: copypaste: 40.3756,61.2946,42.2532,18.7242,43.9202,59.6358
[10/25 14:25:27 d2.utils.events]:  iter: 554401    lr: N/A  max_mem: 623M

facebookresearch / detr

detectron2 trained mAP only 40.4 #461