dbolya / tide

A General Toolbox for Identifying Object Detection Errors
https://dbolya.github.io/tide
MIT License
702 stars 115 forks source link

Is that right to evaluate performance on PASCAL VOC via COCO metric? #24

Closed machengcheng2016 closed 3 years ago

machengcheng2016 commented 3 years ago

The title says my concern. By looking at dataset.py, it seems that TIDE utilizes COCO metric to compute mAP on PASCAL VOC dataset. However, I've compared the VOC official evaluation code with TIDE (which is exactly the COCO evaluation code), and the protocols for assigning tp / fp labels for predicted boxes differs. Given same scores and bboxes, VOC and COCO do output different mAPs. I think that will be a problem. What do you think? @dbolya

dbolya commented 3 years ago

Yeah, I agree that it would be best if we were able to use the VOC version of mAP for PASCAL VOC, but I'm not well versed in the differences so I wouldn't be able to implement that.

Out of curiosity, what's the difference in mAP that you observe? If it's not a huge difference, then I think it's fine. And TIDE is meant as a way to find places you can improve your model, not necessarily to replace the official evaluation numbers. So as long as the numbers correlate, there shouldn't be any issue.

machengcheng2016 commented 3 years ago

Well, I got a ~10% higher mAP with PASCAL VOC metric than COCO metric on VOC2007 test. Some people say that PASCAL VOC greedily finds best match (using IoU criteria) for current pred box and if it is already matched it marks the current pred box as false. While in COCO the search continues if the current best match already matched.

Anyways, I agree with you that the role of TIDE is to find how one can improve the OD model, regardless of ways of evaluating mAP. After all, COCO metric is a reasonable metric.

LiewFeng commented 3 years ago

Well, I got a ~10% higher mAP with PASCAL VOC metric than COCO metric on VOC2007 test. Some people say that PASCAL VOC greedily finds best match (using IoU criteria) for current pred box and if it is already matched it marks the current pred box as false. While in COCO the search continues if the current best match already matched.

Anyways, I agree with you that the role of TIDE is to find how one can improve the OD model, regardless of ways of evaluating mAP. After all, COCO metric is a reasonable metric.

Hi,@machengcheng2016, could you share the code to convert pascal detection results to coco json style? My conversion get a much lower mAP in tide than mmdetection.

LiewFeng commented 3 years ago

Details of my conversion are in this issue

LiewFeng commented 3 years ago

I've solved it.