facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.61k stars 2.45k forks source link

detectron2 trained mAP only 40.4 #461

Open luohao123 opened 3 years ago

luohao123 commented 3 years ago

Here is my config, same:

MODEL:
  META_ARCHITECTURE: "Detr"
  WEIGHTS: "detectron2://ImageNetPretrained/torchvision/R-50.pkl"
  PIXEL_MEAN: [123.675, 116.280, 103.530]
  PIXEL_STD: [58.395, 57.120, 57.375]
  MASK_ON: False
  RESNETS:
    DEPTH: 50
    STRIDE_IN_1X1: False
    OUT_FEATURES: ["res2", "res3", "res4", "res5"]
  DETR:
    GIOU_WEIGHT: 2.0
    L1_WEIGHT: 5.0
    NUM_OBJECT_QUERIES: 100
    ENC_LAYERS: 6
    DEC_LAYERS: 6
    HIDDEN_DIM: 256

DATASETS:
  TRAIN: ("coco_2017_train",)
  TEST: ("coco_2017_val",)

SOLVER:
  IMS_PER_BATCH: 56
  BASE_LR: 0.0001
  STEPS: (369600,)
  MAX_ITER: 554400
  WARMUP_FACTOR: 1.0
  WARMUP_ITERS: 10
  WEIGHT_DECAY: 0.0001
  OPTIMIZER: "ADAMW"
  BACKBONE_MULTIPLIER: 0.1
  CLIP_GRADIENTS:
    ENABLED: True
    CLIP_TYPE: "full_model"
    # CLIP_TYPE: "norm"
    CLIP_VALUE: 0.01
    NORM_TYPE: 2.0
INPUT:
  MIN_SIZE_TRAIN: (480, 512, 544, 576, 608, 640, 672, 704, 736, 768, 800)
  CROP:
    ENABLED: True
    TYPE: "absolute_range"
    SIZE: (384, 600)
  FORMAT: "RGB"
TEST:
  EVAL_PERIOD: 4000
DATALOADER:
  FILTER_EMPTY_ANNOTATIONS: False
  NUM_WORKERS: 2
VERSION: 2

OUTPUT_DIR: "output/coco_detr"

same with d2 settings, but the final result:

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.404
Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.613
Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.423
Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.187
Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.439
Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.596
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.323
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.517
Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.551
Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.279
Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.602
Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.779
[10/25 14:25:27 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 40.376 | 61.295 | 42.253 | 18.724 | 43.920 | 59.636 |
[10/25 14:25:27 d2.evaluation.coco_evaluation]: Per-category bbox AP: 
| category      | AP     | category     | AP     | category       | AP     |
|:--------------|:-------|:-------------|:-------|:---------------|:-------|
| person        | 50.701 | bicycle      | 30.469 | car            | 37.387 |
| motorcycle    | 44.267 | airplane     | 67.917 | bus            | 65.528 |
| train         | 65.090 | truck        | 32.143 | boat           | 22.165 |
| traffic light | 19.022 | fire hydrant | 65.936 | stop sign      | 58.840 |
| parking meter | 42.180 | bench        | 26.143 | bird           | 30.890 |
| cat           | 73.216 | dog          | 68.355 | horse          | 60.755 |
| sheep         | 51.302 | cow          | 54.661 | elephant       | 64.619 |
| bear          | 72.397 | zebra        | 69.112 | giraffe        | 69.636 |
| backpack      | 12.881 | umbrella     | 38.726 | handbag        | 13.226 |
| tie           | 29.277 | suitcase     | 38.705 | frisbee        | 55.926 |
| skis          | 23.012 | snowboard    | 36.953 | sports ball    | 34.675 |
| kite          | 36.643 | baseball bat | 34.237 | baseball glove | 31.970 |
| skateboard    | 49.857 | surfboard    | 36.456 | tennis racket  | 46.072 |
| bottle        | 28.812 | wine glass   | 29.662 | cup            | 36.442 |
| fork          | 30.848 | knife        | 15.650 | spoon          | 16.332 |
| bowl          | 38.622 | banana       | 23.552 | apple          | 19.805 |
| sandwich      | 38.018 | orange       | 31.823 | broccoli       | 24.402 |
| carrot        | 17.321 | hot dog      | 39.350 | pizza          | 52.068 |
| donut         | 41.069 | cake         | 36.364 | chair          | 26.023 |
| couch         | 46.775 | potted plant | 25.253 | bed            | 48.952 |
| dining table  | 31.275 | toilet       | 63.526 | tv             | 55.941 |
| laptop        | 62.972 | mouse        | 53.099 | remote         | 24.020 |
| keyboard      | 50.445 | cell phone   | 26.526 | microwave      | 54.395 |
| oven          | 37.117 | toaster      | 33.269 | sink           | 36.691 |
| refrigerator  | 58.727 | book         | 9.994  | clock          | 45.220 |
| vase          | 32.496 | scissors     | 36.930 | teddy bear     | 49.291 |
| hair drier    | 18.775 | toothbrush   | 20.854 |                |        |
[10/25 14:25:27 d2.engine.defaults]: Evaluation results for coco_2017_val in csv format:
[10/25 14:25:27 d2.evaluation.testing]: copypaste: Task: bbox
[10/25 14:25:27 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[10/25 14:25:27 d2.evaluation.testing]: copypaste: 40.3756,61.2946,42.2532,18.7242,43.9202,59.6358
[10/25 14:25:27 d2.utils.events]:  iter: 554401    lr: N/A  max_mem: 623M
imadgohar commented 2 years ago

I have few question about how you got these results on your data. how many gpu's you have used (Is it single)? and how many classes you have 99 or 100? what command did you used during training? and last one how did you adjust the lr?

thanks