Different box ap result between Detectron2 COCOEvaluator and cocoapi COCOeval.

kid134679 commented 3 years ago

If you do not know the root cause of the problem, please post according to this template:

Instructions To Reproduce the Issue:

Check https://stackoverflow.com/help/minimal-reproducible-example for how to ask good questions. Simplify the steps to reproduce the issue using suggestions from the above link, and provide them below:

Full runnable code or full changes you made: modified code from Detectron2 tutorial colab.


from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader

cfg_c4 = get_cfg() cfg_c4.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_C4_1x.yaml")) cfg_c4.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_C4_1x.yaml") predictor_c4 = DefaultPredictor(cfg_c4)

evaluator_c4 = COCOEvaluator("coco_2017_valid", ("bbox",), False, output_dir="./drive/MyDrive/output2/R50-c4") val_loader_c4 = build_detection_test_loader(cfg_c4, "coco_2017_valid") print(inference_on_dataset(predictor_c4.model, val_loader_c4, evaluator_c4))

following results:

[01/26 13:26:12 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API... Loading and preparing results... DONE (t=0.24s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox COCOeval_opt.evaluate() finished in 8.72 seconds. Accumulating evaluation results... COCOeval_opt.accumulate() finished in 1.10 seconds. Average Precision (AP) @[ IoU=0.50:0.95	area= all	maxDets=100 ] = 0.357 Average Precision (AP) @[ IoU=0.50	area= all	maxDets=100 ] = 0.561 Average Precision (AP) @[ IoU=0.75	area= all	maxDets=100 ] = 0.380 Average Precision (AP) @[ IoU=0.50:0.95	area= small	maxDets=100 ] = 0.192 Average Precision (AP) @[ IoU=0.50:0.95	area=medium	maxDets=100 ] = 0.409 Average Precision (AP) @[ IoU=0.50:0.95	area= large	maxDets=100 ] = 0.487 Average Recall (AR) @[ IoU=0.50:0.95	area= all	maxDets= 1 ] = 0.311 Average Recall (AR) @[ IoU=0.50:0.95	area= all	maxDets= 10 ] = 0.485 Average Recall (AR) @[ IoU=0.50:0.95	area= all	maxDets=100 ] = 0.506 Average Recall (AR) @[ IoU=0.50:0.95	area= small	maxDets=100 ] = 0.310 Average Recall (AR) @[ IoU=0.50:0.95	area=medium	maxDets=100 ] = 0.563 Average Recall (AR) @[ IoU=0.50:0.95	area= large	maxDets=100 ] = 0.664 [01/26 13:26:22 d2.evaluation.coco_evaluation]: Evaluation results for bbox:	AP	AP50	AP75	APs	APm	APl
35.681	56.098	38.027	19.234	40.860	48.712

[01/26 13:26:22 d2.evaluation.coco_evaluation]: Per-category bbox AP:	category	AP	category	AP	category
person	50.350	bicycle	27.523	car	37.185
motorcycle	38.172	airplane	59.584	bus	61.440
train	56.396	truck	28.257	boat	22.444
traffic light	22.127	fire hydrant	62.067	stop sign	60.377
parking meter	42.841	bench	19.896	bird	30.032
cat	60.329	dog	52.680	horse	52.337
sheep	44.474	cow	47.850	elephant	55.965
bear	60.552	zebra	64.517	giraffe	63.984
backpack	11.795	umbrella	33.448	handbag	10.412
tie	24.751	suitcase	27.130	frisbee	58.991
skis	17.071	snowboard	30.937	sports ball	38.860
kite	35.119	baseball bat	20.442	baseball glove	30.233
skateboard	44.891	surfboard	31.239	tennis racket	42.120
bottle	32.248	wine glass	28.630	cup	36.748
fork	26.514	knife	9.067	spoon	10.924
bowl	36.522	banana	20.818	apple	16.500
sandwich	29.789	orange	27.876	broccoli	21.829
carrot	18.538	hot dog	29.126	pizza	50.045
donut	39.001	cake	29.378	chair	23.011
couch	36.161	potted plant	21.234	bed	36.842
dining table	24.268	toilet	55.259	tv	51.192
laptop	54.792	mouse	51.603	remote	20.563
keyboard	46.830	cell phone	28.652	microwave	51.092
oven	29.833	toaster	33.304	sink	31.475
refrigerator	47.891	book	10.685	clock	45.312
vase	31.155	scissors	22.562	teddy bear	42.953
hair drier	4.239	toothbrush	11.217

2. What exact command you run:
Re-calculate the AP by the model's output file('./drive/MyDrive/output2/R50-c4/coco_instances_results.json') and cocoapi(https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb)

cocoEval = COCOeval(cocoGt,cocoDt_c4,annType) cocoEval.params.imgIds = imgIds cocoEval.evaluate() cocoEval.accumulate() cocoEval.summarize()


3. __Full logs__ or other relevant observations:
The results from the cocoapi(https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb)


## Expected behavior:

As the output file (coco_instances_results.json) is generated through the model,
the results from this file should be exactly same with detectron2's evaluation result.
But the result shows different value "35.681" vs "44.8" in AP[IoU=0.50:0.95] and also in other metrics.
+a) even with the argument setting in use_fast_impl=False shows same result to 35.681 AP

Why does this behavior happens?

ps. Is it able to reproduce exactly same result as 35.681 AP value with coco_instances_results.json file at Detectron2?

## Environment:

Provide your environment information using the following command:

Detectron2 tutorial colab

github-actions[bot] commented 3 years ago

You've chosen to report an unexpected problem or bug. Unless you already know the root cause of it, please include details about it by filling the issue template. The following information is missing: "Your Environment";

ppwwyyxx commented 3 years ago

Could you provide full runnable code? Currently many variables (e.g. cocoGt,cocoDt_c4) are undefined.

ppwwyyxx commented 3 years ago

The following code prints the same results for both evaluation, so I believe the issue does not exist:

from detectron2.evaluation import COCOEvaluator, inference_on_dataset
from detectron2.data import build_detection_test_loader
from detectron2.config import get_cfg
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.utils.logger import setup_logger
from detectron2.data import MetadataCatalog

setup_logger()
cfg_c4 = get_cfg()
cfg_c4.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_C4_1x.yaml"))
cfg_c4.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_C4_1x.yaml")
predictor_c4 = DefaultPredictor(cfg_c4)

evaluator_c4 = COCOEvaluator("coco_2017_val", ("bbox",), False, output_dir="./output")
val_loader_c4 = build_detection_test_loader(cfg_c4, "coco_2017_val")
print(inference_on_dataset(predictor_c4.model, val_loader_c4, evaluator_c4))

from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
cocoGt = COCO(MetadataCatalog.get("coco_2017_val").json_file)
cocoDt = cocoGt.loadRes("./output/coco_instances_results.json")
cocoEval = COCOeval(cocoGt,cocoDt,"bbox")
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize()

facebookresearch / detectron2

Different box ap result between Detectron2 COCOEvaluator and cocoapi COCOeval. #2548

Instructions To Reproduce the Issue: