facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.69k stars 7.51k forks source link

Different box ap result between Detectron2 COCOEvaluator and cocoapi COCOeval. #2548

Closed kid134679 closed 3 years ago

kid134679 commented 3 years ago

If you do not know the root cause of the problem, please post according to this template:

Instructions To Reproduce the Issue:

Check https://stackoverflow.com/help/minimal-reproducible-example for how to ask good questions. Simplify the steps to reproduce the issue using suggestions from the above link, and provide them below:

  1. Full runnable code or full changes you made: modified code from Detectron2 tutorial colab.
    
    from detectron2.evaluation import COCOEvaluator, inference_on_dataset
    from detectron2.data import build_detection_test_loader

cfg_c4 = get_cfg() cfg_c4.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_C4_1x.yaml")) cfg_c4.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_C4_1x.yaml") predictor_c4 = DefaultPredictor(cfg_c4)

evaluator_c4 = COCOEvaluator("coco_2017_valid", ("bbox",), False, output_dir="./drive/MyDrive/output2/R50-c4") val_loader_c4 = build_detection_test_loader(cfg_c4, "coco_2017_valid") print(inference_on_dataset(predictor_c4.model, val_loader_c4, evaluator_c4))

following results:
[01/26 13:26:12 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API... Loading and preparing results... DONE (t=0.24s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox COCOeval_opt.evaluate() finished in 8.72 seconds. Accumulating evaluation results... COCOeval_opt.accumulate() finished in 1.10 seconds. Average Precision (AP) @[ IoU=0.50:0.95 area= all maxDets=100 ] = 0.357 Average Precision (AP) @[ IoU=0.50 area= all maxDets=100 ] = 0.561 Average Precision (AP) @[ IoU=0.75 area= all maxDets=100 ] = 0.380 Average Precision (AP) @[ IoU=0.50:0.95 area= small maxDets=100 ] = 0.192 Average Precision (AP) @[ IoU=0.50:0.95 area=medium maxDets=100 ] = 0.409 Average Precision (AP) @[ IoU=0.50:0.95 area= large maxDets=100 ] = 0.487 Average Recall (AR) @[ IoU=0.50:0.95 area= all maxDets= 1 ] = 0.311 Average Recall (AR) @[ IoU=0.50:0.95 area= all maxDets= 10 ] = 0.485 Average Recall (AR) @[ IoU=0.50:0.95 area= all maxDets=100 ] = 0.506 Average Recall (AR) @[ IoU=0.50:0.95 area= small maxDets=100 ] = 0.310 Average Recall (AR) @[ IoU=0.50:0.95 area=medium maxDets=100 ] = 0.563 Average Recall (AR) @[ IoU=0.50:0.95 area= large maxDets=100 ] = 0.664 [01/26 13:26:22 d2.evaluation.coco_evaluation]: Evaluation results for bbox: AP AP50 AP75 APs APm APl
35.681 56.098 38.027 19.234 40.860 48.712
[01/26 13:26:22 d2.evaluation.coco_evaluation]: Per-category bbox AP: category AP category AP category AP
person 50.350 bicycle 27.523 car 37.185
motorcycle 38.172 airplane 59.584 bus 61.440
train 56.396 truck 28.257 boat 22.444
traffic light 22.127 fire hydrant 62.067 stop sign 60.377
parking meter 42.841 bench 19.896 bird 30.032
cat 60.329 dog 52.680 horse 52.337
sheep 44.474 cow 47.850 elephant 55.965
bear 60.552 zebra 64.517 giraffe 63.984
backpack 11.795 umbrella 33.448 handbag 10.412
tie 24.751 suitcase 27.130 frisbee 58.991
skis 17.071 snowboard 30.937 sports ball 38.860
kite 35.119 baseball bat 20.442 baseball glove 30.233
skateboard 44.891 surfboard 31.239 tennis racket 42.120
bottle 32.248 wine glass 28.630 cup 36.748
fork 26.514 knife 9.067 spoon 10.924
bowl 36.522 banana 20.818 apple 16.500
sandwich 29.789 orange 27.876 broccoli 21.829
carrot 18.538 hot dog 29.126 pizza 50.045
donut 39.001 cake 29.378 chair 23.011
couch 36.161 potted plant 21.234 bed 36.842
dining table 24.268 toilet 55.259 tv 51.192
laptop 54.792 mouse 51.603 remote 20.563
keyboard 46.830 cell phone 28.652 microwave 51.092
oven 29.833 toaster 33.304 sink 31.475
refrigerator 47.891 book 10.685 clock 45.312
vase 31.155 scissors 22.562 teddy bear 42.953
hair drier 4.239 toothbrush 11.217
2. What exact command you run:
Re-calculate the AP by the model's output file('./drive/MyDrive/output2/R50-c4/coco_instances_results.json') and cocoapi(https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb)

cocoEval = COCOeval(cocoGt,cocoDt_c4,annType) cocoEval.params.imgIds = imgIds cocoEval.evaluate() cocoEval.accumulate() cocoEval.summarize()


3. __Full logs__ or other relevant observations:
The results from the cocoapi(https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb)

Running per image evaluation... Evaluate annotation type bbox DONE (t=0.66s). Accumulating evaluation results... DONE (t=0.37s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.448 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.662 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.506 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.284 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.471 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.659 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.386 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.543 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.551 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.340 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.564 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.750


## Expected behavior:

As the output file (coco_instances_results.json) is generated through the model,
the results from this file should be exactly same with detectron2's evaluation result.
But the result shows different value "35.681" vs "44.8" in AP[IoU=0.50:0.95] and also in other metrics.
+a) even with the argument setting in use_fast_impl=False shows same result to 35.681 AP

Why does this behavior happens?

ps. Is it able to reproduce exactly same result as 35.681 AP value with coco_instances_results.json file at Detectron2?

## Environment:

Provide your environment information using the following command:

Detectron2 tutorial colab

github-actions[bot] commented 3 years ago

You've chosen to report an unexpected problem or bug. Unless you already know the root cause of it, please include details about it by filling the issue template. The following information is missing: "Your Environment";

ppwwyyxx commented 3 years ago

Could you provide full runnable code? Currently many variables (e.g. cocoGt,cocoDt_c4) are undefined.

ppwwyyxx commented 3 years ago

The following code prints the same results for both evaluation, so I believe the issue does not exist:

from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader
from detectron2.config import get_cfg
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.utils.logger import setup_logger
from detectron2.data import MetadataCatalog

setup_logger()
cfg_c4 = get_cfg()
cfg_c4.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_C4_1x.yaml"))
cfg_c4.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_C4_1x.yaml")
predictor_c4 = DefaultPredictor(cfg_c4)

evaluator_c4 = COCOEvaluator("coco_2017_val", ("bbox",), False, output_dir="./output")
val_loader_c4 = build_detection_test_loader(cfg_c4, "coco_2017_val")
print(inference_on_dataset(predictor_c4.model, val_loader_c4, evaluator_c4))

from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
cocoGt = COCO(MetadataCatalog.get("coco_2017_val").json_file)
cocoDt = cocoGt.loadRes("./output/coco_instances_results.json")
cocoEval = COCOeval(cocoGt,cocoDt,"bbox")
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize()