AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.66k stars 7.96k forks source link

discrepancy between two ways of measuring the map@IoU=0.5 #5643

Open ZhengRui opened 4 years ago

ZhengRui commented 4 years ago

@AlexeyAB Thanks for the great work ! I followed https://github.com/AlexeyAB/darknet/issues/2145#issuecomment-451908890 to get the map@IoU=0.5 of yolov4.weights model on COCO2017 validation set.

Set -points flag: -points 101 for MS COCO -points 11 for PascalVOC 2007 (uncomment difficult in voc.data) -points 0 (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset


- Method 2: 
  1. `./darknet detector valid ~/Work/Datasets/yolo_data/coco2017/coco.data cfg/yolov4.cfg weights/yolov4.weights` to generate `coco_results.json` inside `results` folder
  2. I use this evaluation script `coco_eval.py` to run the evaluation:
```python
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
import argparse

def coco_eval(args):
    cocoGt = COCO(args.gt_json)
    cocoDt = cocoGt.loadRes(args.pred_json)
    cocoEval = COCOeval(cocoGt, cocoDt, args.eval_type)
    cocoEval.evaluate()
    cocoEval.accumulate()
    cocoEval.summarize()

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Evaluate segm/bbox/keypoints in COCO format.')
    parser.add_argument('gt_json', type=str, help="COCO format segmentation/detection/keypoints ground truth json file")
    parser.add_argument('pred_json', type=str, help="COCO format segmentation/detection/keypoints prediction json file")
    parser.add_argument('eval_type', type=str, choices=['segm', 'bbox', 'keypoints'], help="Evaluation type")
    args = parser.parse_args()
    coco_eval(args)

python coco2017_data/coco_eval.py ../../Datasets/coco/annotations/instances_val2017.json ./results/coco_results.json bbox gives map@IoU=0.5 74.9, and the log is:

loading annotations into memory...
Done (t=0.37s)
creating index...
index created!
Loading and preparing results...
DONE (t=3.28s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=56.92s).
Accumulating evaluation results...
DONE (t=7.39s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.505
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.749
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.557
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.357
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.559
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.614
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.368
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.598
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.633
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.500
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.680
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.757

Do you know why these two methods give different map@IoU=0.5, maybe I misunderstood something?

AlexeyAB commented 4 years ago

I don't know the reason. If you will find mistake in my code or pycocotool, let me know )

For checking accuracy for MSCOCO models, I use pycocotool or codalab-evaluation server.

ZhengRui commented 4 years ago

I thought of some potential reasons:

On the pycocotools side:

On the darknet side, there are some discrepancies between using ./darknet detector valid and ./darknet detector map

I have tried to not ignore iscrowd boxes, and not filtering using maxDets on pycocotools, while using same thresh=0.001 and same get_network_boxes parameters for valid and map on darknet (used -letter_box option in the command), but still not able to get same map@IoU=0.5. I haven't checked the downstream logic that calculates pr curve and map. I focused on ensuring detections_count equal to the number of boxes sent to pycocotools, but still not able to make it.

Do you have any further thoughts @AlexeyAB ? Thanks.

AlexeyAB commented 4 years ago

@ZhengRui I don't know. Try to un-comment this line and recompile: https://github.com/AlexeyAB/darknet/blob/0ef5052ee51e82b2862fab5e9135b7bae060354f/src/detector.c#L1281

Try to use the same thresh=0.001 in both cases.

Also try to set 11 pr-points instead of 101 points in both Darknet and Pycocotool, for easier debugging.

Then compare Precision and Recall for one of classes between Darknet and Pycocotool (but not for person-class to avoid crowd issue).

tand826 commented 4 years ago

@ZhengRui Is there any progress?

ZhengRui commented 4 years ago

@tand826 Unfortunately I didn't got time to further look into this