allanzelener / YAD2K

YAD2K: Yet Another Darknet 2 Keras
Other
2.71k stars 877 forks source link

Performance: mAP #22

Closed AceCoooool closed 7 years ago

AceCoooool commented 7 years ago

Convert the "office weight"(yolov2) from yolo website, I find the mAP nearly 0.57+, worse than the office give '0.768',do you test the mAP in your model on the voc dataset? Could you tell me the mAP of your test? (If you test the mAP). thank you~!

AceCoooool commented 7 years ago

After I change the score_threshold from 0.6 to 0.5, the mAP is up to 0.6252. However, it's still worse than the office's.

allanzelener commented 7 years ago

Sorry, I haven't tested mAP on Pascal VOC myself. Changing the iou_threshold may also help.

Could you share the code you're using to calculate mAP? Is it the same as what's in the Pascal VOC toolkit?

AceCoooool commented 7 years ago

Thank you for your reply. I will try the iou_score and write a train phase to get new weight to evaluate the mAP.

I use the code come from Faster rcnn to caculate the mAP.

allanzelener commented 7 years ago

I think that Python script should be but is not necessarily the same as the official Pascal VOC Matlab evaluation code. I would also try to run the original model with Darknet to see if you get the same result there. I've tried to replicate Darknet as closely as possible but there might still be things I've missed.

AceCoooool commented 7 years ago

I find the main problem is in the yolo_filter_boxes function : the score scores = tf.boolean_mask(box_class_scores, prediction_mask) should be changed to the box_confidence rather than the box_class_scores。(it will influence the tf.non_max_suppression results.) after change it, and make the nms_threshold=0.45, score_threshold=0.005 (i am not sure why the score_threshold should be very small). ~ the mAP can be 0.74+

AceCoooool commented 7 years ago

I am sorry!!! I changed the box_class_score...~ Your are right!!!!~ I am so sorry to make this stupid mistake.
Sorry!

mmderakhshani commented 7 years ago

@AceCoooool Did you validate YAD2K model. I need to know how well this model works.

AceCoooool commented 7 years ago

Yes, I achieve the same network like YAD2K in pytorch (I modify the whole output is the same as this version), and using the office trained weight. the mAP is nearly the same as the officals (a little lower, nearly 1 point). However, I am not sure why the nms_threshold=0.45, score_threshold=0.005 (score_threshold should be so small in evaluate the mAP, and higher in the test or demo stage.... I see several achievment using the small value of score_threshold to evaluate mAP , If you using the same score_threshold as test or demo stage, the mAP will decrease nearly 10%) --- I didn't do much experience in this. Sorry.

allanzelener commented 7 years ago

To clarify, average precision (AP) is a measure of average performance across all possible score thresholds for a single classifier/class. You should just set the score threshold to 0 when computing AP, anything greater makes this number smaller. Mean AP (mAP) is the average of the APs for each class.

Also note that while the model may be the same as the paper, the training procedure may be different and can lead to different performance.

On Wed, Jun 14, 2017, 8:16 AM Ace notifications@github.com wrote:

Yes, I achieve the same network like YAD2K in pytorch (I modify the whole output is the same as this version), and using the office trained weight. the mAP is nearly the same as the officals (a little lower, nearly 1 point). However, I am not sure why the nms_threshold=0.45, score_threshold=0.005 (score_threshold should be so small in evaluate the mAP, and higher in the test or demo stage.... I see several achievment using the small value of score_threshold to evaluate mAP , If you using the same score_threshold as test or demo stage, the mAP will decrease nearly 10%) --- I didn't do much experience in this. Sorry.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/allanzelener/YAD2K/issues/22#issuecomment-308464307, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVyfMtWkWhZNvZM0PCzy3aeBPLM47Ovks5sD_lkgaJpZM4NaUXz .

AceCoooool commented 7 years ago

thank you very much, allanzelener !!!

mmderakhshani commented 7 years ago

@AceCoooool Could you please tell me how did you validate pytorch model? I mean did you use mscoco API for validation?

AceCoooool commented 7 years ago

I use the VOC dataset. And I use a very stupid method to verify the difference between two models. (The same image have the same output. --- my model's output is same as the yad2k: not only in the feature, but also the finally predict box, socre ), and evaluate the mAP by faster rcnn eval_code .

Maybe, I misundestand your question.

mmderakhshani commented 7 years ago

@AceCoooool As far as I understood, you just validated your model with pascal voc test data. My question was related to Mscoco detection dataset and the mAP about that (Because in author's paper, the author claimed We also train on COCO and compare to other methods in Table 5. On the VOC metric (IOU = .5) YOLOv2 gets 44.0 mAP, comparable to SSD and Faster R-CNN). I would like to reproduce this number. Any way, could you please share your written evaluation code with me for pascal voc test data?

jks88990041 commented 2 years ago

Yes,I also find that issue. To be truth, the original author didn't mention the both threshold in his paper. I also tried other codes using YOLOv2, there were still this problems. By the way, I don't know how to train on this code.I can't change the VOC2007+2012 into npz format. For the hdp5 format file, the train code maybe some issues? . Your test code is really good, but the train code got the bottle neck. Can someone tell me how to improve the train code? Or,how can I change the dataset into npz format file. I try use the ancient low method, but when I merge the data to the npz format,the linux told me it was too big to merge.