Closed houzhijian closed 5 years ago
The difference between my code and pycocotool is in the computation of recall thresholds. The actual score difference as I have tested on the ActEV 200k validation set numerous times is within 1%, and it is smaller given larger datasets. And obviously, the two code are linearly correlated given the same outputs. I used pycocotool before but I switched to the current code for efficiency.
You are welcome to send a pull request if you replace that part of the code with pycocotool numpy computation.
Hi, JunweiLiang: Thanks for your quick response and clear explanation. I am just quite confused about the difference between your code and the coco evaluation code. Yeah, I have not tried to compare the real result of those two. :sweat_smile:. Sorry for the distraction. I didn't mean to challenge you.
That's OK. I guess it is because I'm having a bad day. Actually, you do remind me of making the evaluation code exactly the same as COCO's. I will do this in September. Basically replacing the compute_AP function to a more efficient version of cocotool.
Cheers, Junwei
Hi, JunweiLiang: I have one more thing to consult. As your readme.md file puts it, Cascade RCNN doesn't help (IOU=0.5). I'm using IOU=0.5 in my evaluation since the original annotations are not "tight" bounding boxes. I wonder whether it is necessary to use COCO AP at IoU=.50:.05:.95 as primary challenge metric like COCO for this dataset. how about AP at IoU=.50 (PASCAL VOC metric) as primary challenge metric for this dataset? What do you think?
I'm using IOU=.5 in all evaluations in ActEV. You can set IOU=.5 in pycocotools as well. So this is basically the VOC metric.
As I stated, you can't really use IoU=.95 in this dataset since the ground truth bounding boxes are not accurate/tight. From the making-a-real-application-that-works standpoint, it makes no real difference between IOU=.5 and IOU=.75, as humans can hardly tell IOU=.3 from IOU=.5 as stated by the YOLOv3 paper.
Hi, JunweiLiang: If I am right, your ap evaluation code is here https://github.com/JunweiLiang/Object_Detection_Tracking/blob/8af1178fe2743c9954fa85a22a14fa6ca46776b0/utils.py#L553-L569 However, it is not a standard ap evaluation for object detection adopted by PASCAL VOC or COCO dataset. I think your implementation is the standard ap evaluation for information retrieval.
It is incomparable with your given result if I use PASCAL VOC evaluation. So I think you can clarify it in your readme.file.