david8862 / keras-YOLOv3-model-set

end-to-end YOLOv4/v3/v2 object detection pipeline, implemented on tf.keras with different technologies
MIT License
639 stars 221 forks source link

Somethings wrong with weights which you provided. #10

Open johnjunjun7 opened 4 years ago

johnjunjun7 commented 4 years ago

The result of Tiny YOLOv3 Lite-Mobilenet, which you provide in the README.md. I use the weights you provide, then I test the model on the VOC2007 with the code below: python eval.py --model_path=tiny_yolo3_mobilnet_lite_416_voc.h5 --anchors_path=configs/tiny_yolo3_anchors.txt --classes_path=configs/voc_classes.txt --model_image_size=416x416 --eval_type=VOC --iou_threshold=0.5 --conf_threshold=0.1 --annotation_file=tools/2007_test.txt --save_result

Then I get the result below, This result is much lower than the value you gave——72.60%

Pascal VOC AP evaluation aeroplane: AP 0.7189, precision 0.7566, recall 0.7395 bicycle: AP 0.6749, precision 0.7128, recall 0.7018 bird: AP 0.5710, precision 0.7094, recall 0.6146 boat: AP 0.4352, precision 0.5123, recall 0.5293 bottle: AP 0.3755, precision 0.4835, recall 0.4460 bus: AP 0.6701, precision 0.7243, recall 0.6929 car: AP 0.6665, precision 0.6896, recall 0.6963 cat: AP 0.7911, precision 0.7103, recall 0.8216 chair: AP 0.3475, precision 0.5051, recall 0.4367 cow: AP 0.5558, precision 0.6398, recall 0.6261 diningtable: AP 0.5163, precision 0.5737, recall 0.5987 dog: AP 0.7250, precision 0.6420, recall 0.7849 horse: AP 0.7407, precision 0.7348, recall 0.7646 motorbike: AP 0.6854, precision 0.7082, recall 0.7236 person: AP 0.6756, precision 0.7062, recall 0.7128 pottedplant: AP 0.3647, precision 0.4815, recall 0.4848 sheep: AP 0.5517, precision 0.6169, recall 0.6109 sofa: AP 0.5112, precision 0.6150, recall 0.5808 train: AP 0.7596, precision 0.6649, recall 0.8146 tvmonitor: AP 0.6470, precision 0.6005, recall 0.6953 mAP@IoU=0.50 result: 59.918650 mPrec@IoU=0.50 result: 63.936128 mRec@IoU=0.50 result: 65.378780

Is there a problem with the weights you provided, or is the code I configured wrong, or do I need to continue training with the weights you provided?Looking forward to your answer :)

david8862 commented 4 years ago

You get the point :D Actually the mAP value in list is calculated with (currently default) conf_threshold=0.001, which will cause a higher recall rate but low precision rate. This is a common trick for object detectors to get higher mAP result. But if you want to use it in practise, a reasonable conf_threshold like 0.1 is necessary.

The result of Tiny YOLOv3 Lite-Mobilenet, which you provide in the README.md. I use the weights you provide, then I test the model on the VOC2007 with the code below: python eval.py --model_path=tiny_yolo3_mobilnet_lite_416_voc.h5 --anchors_path=configs/tiny_yolo3_anchors.txt --classes_path=configs/voc_classes.txt --model_image_size=416x416 --eval_type=VOC --iou_threshold=0.5 --conf_threshold=0.1 --annotation_file=tools/2007_test.txt --save_result

Then I get the result below, This result is much lower than the value you gave——72.60%

Pascal VOC AP evaluation aeroplane: AP 0.7189, precision 0.7566, recall 0.7395 bicycle: AP 0.6749, precision 0.7128, recall 0.7018 bird: AP 0.5710, precision 0.7094, recall 0.6146 boat: AP 0.4352, precision 0.5123, recall 0.5293 bottle: AP 0.3755, precision 0.4835, recall 0.4460 bus: AP 0.6701, precision 0.7243, recall 0.6929 car: AP 0.6665, precision 0.6896, recall 0.6963 cat: AP 0.7911, precision 0.7103, recall 0.8216 chair: AP 0.3475, precision 0.5051, recall 0.4367 cow: AP 0.5558, precision 0.6398, recall 0.6261 diningtable: AP 0.5163, precision 0.5737, recall 0.5987 dog: AP 0.7250, precision 0.6420, recall 0.7849 horse: AP 0.7407, precision 0.7348, recall 0.7646 motorbike: AP 0.6854, precision 0.7082, recall 0.7236 person: AP 0.6756, precision 0.7062, recall 0.7128 pottedplant: AP 0.3647, precision 0.4815, recall 0.4848 sheep: AP 0.5517, precision 0.6169, recall 0.6109 sofa: AP 0.5112, precision 0.6150, recall 0.5808 train: AP 0.7596, precision 0.6649, recall 0.8146 tvmonitor: AP 0.6470, precision 0.6005, recall 0.6953 mAP@IoU=0.50 result: 59.918650 mPrec@IoU=0.50 result: 63.936128 mRec@IoU=0.50 result: 65.378780

Is there a problem with the weights you provided, or is the code I configured wrong, or do I need to continue training with the weights you provided?Looking forward to your answer :)

johnjunjun7 commented 4 years ago

You get the point :D Actually the mAP value in list is calculated with (currently default) conf_threshold=0.001, which will cause a higher recall rate but low precision rate. This is a common trick for object detectors to get higher mAP result. But if you want to use it in practise, a reasonable conf_threshold like 0.1 is necessary.

The result of Tiny YOLOv3 Lite-Mobilenet, which you provide in the README.md. I use the weights you provide, then I test the model on the VOC2007 with the code below: python eval.py --model_path=tiny_yolo3_mobilnet_lite_416_voc.h5 --anchors_path=configs/tiny_yolo3_anchors.txt --classes_path=configs/voc_classes.txt --model_image_size=416x416 --eval_type=VOC --iou_threshold=0.5 --conf_threshold=0.1 --annotation_file=tools/2007_test.txt --save_result Then I get the result below, This result is much lower than the value you gave——72.60% Pascal VOC AP evaluation aeroplane: AP 0.7189, precision 0.7566, recall 0.7395 bicycle: AP 0.6749, precision 0.7128, recall 0.7018 bird: AP 0.5710, precision 0.7094, recall 0.6146 boat: AP 0.4352, precision 0.5123, recall 0.5293 bottle: AP 0.3755, precision 0.4835, recall 0.4460 bus: AP 0.6701, precision 0.7243, recall 0.6929 car: AP 0.6665, precision 0.6896, recall 0.6963 cat: AP 0.7911, precision 0.7103, recall 0.8216 chair: AP 0.3475, precision 0.5051, recall 0.4367 cow: AP 0.5558, precision 0.6398, recall 0.6261 diningtable: AP 0.5163, precision 0.5737, recall 0.5987 dog: AP 0.7250, precision 0.6420, recall 0.7849 horse: AP 0.7407, precision 0.7348, recall 0.7646 motorbike: AP 0.6854, precision 0.7082, recall 0.7236 person: AP 0.6756, precision 0.7062, recall 0.7128 pottedplant: AP 0.3647, precision 0.4815, recall 0.4848 sheep: AP 0.5517, precision 0.6169, recall 0.6109 sofa: AP 0.5112, precision 0.6150, recall 0.5808 train: AP 0.7596, precision 0.6649, recall 0.8146 tvmonitor: AP 0.6470, precision 0.6005, recall 0.6953 mAP@IoU=0.50 result: 59.918650 mPrec@IoU=0.50 result: 63.936128 mRec@IoU=0.50 result: 65.378780 Is there a problem with the weights you provided, or is the code I configured wrong, or do I need to continue training with the weights you provided?Looking forward to your answer :)

Thank you for your answer, but there are still a few questions:

  1. I used the 0.001 threshold for testing. The mAP obtained still does not reach the value in your list. just 66%
  2. In the current paper, what is the conf_threshold generally given, for example, YOLO NANO does not give a specific value. So what threshold do I need to use for comparison?
david8862 commented 4 years ago

You get the point :D Actually the mAP value in list is calculated with (currently default) conf_threshold=0.001, which will cause a higher recall rate but low precision rate. This is a common trick for object detectors to get higher mAP result. But if you want to use it in practise, a reasonable conf_threshold like 0.1 is necessary.

The result of Tiny YOLOv3 Lite-Mobilenet, which you provide in the README.md. I use the weights you provide, then I test the model on the VOC2007 with the code below: python eval.py --model_path=tiny_yolo3_mobilnet_lite_416_voc.h5 --anchors_path=configs/tiny_yolo3_anchors.txt --classes_path=configs/voc_classes.txt --model_image_size=416x416 --eval_type=VOC --iou_threshold=0.5 --conf_threshold=0.1 --annotation_file=tools/2007_test.txt --save_result Then I get the result below, This result is much lower than the value you gave——72.60% Pascal VOC AP evaluation aeroplane: AP 0.7189, precision 0.7566, recall 0.7395 bicycle: AP 0.6749, precision 0.7128, recall 0.7018 bird: AP 0.5710, precision 0.7094, recall 0.6146 boat: AP 0.4352, precision 0.5123, recall 0.5293 bottle: AP 0.3755, precision 0.4835, recall 0.4460 bus: AP 0.6701, precision 0.7243, recall 0.6929 car: AP 0.6665, precision 0.6896, recall 0.6963 cat: AP 0.7911, precision 0.7103, recall 0.8216 chair: AP 0.3475, precision 0.5051, recall 0.4367 cow: AP 0.5558, precision 0.6398, recall 0.6261 diningtable: AP 0.5163, precision 0.5737, recall 0.5987 dog: AP 0.7250, precision 0.6420, recall 0.7849 horse: AP 0.7407, precision 0.7348, recall 0.7646 motorbike: AP 0.6854, precision 0.7082, recall 0.7236 person: AP 0.6756, precision 0.7062, recall 0.7128 pottedplant: AP 0.3647, precision 0.4815, recall 0.4848 sheep: AP 0.5517, precision 0.6169, recall 0.6109 sofa: AP 0.5112, precision 0.6150, recall 0.5808 train: AP 0.7596, precision 0.6649, recall 0.8146 tvmonitor: AP 0.6470, precision 0.6005, recall 0.6953 mAP@IoU=0.50 result: 59.918650 mPrec@IoU=0.50 result: 63.936128 mRec@IoU=0.50 result: 65.378780 Is there a problem with the weights you provided, or is the code I configured wrong, or do I need to continue training with the weights you provided?Looking forward to your answer :)

Thank you for your answer, but there are still a few questions:

  1. I used the 0.001 threshold for testing. The mAP obtained still does not reach the value in your list. just 66%
  2. In the current paper, what is the conf_threshold generally given, for example, YOLO NANO does not give a specific value. So what threshold do I need to use for comparison?
  1. I've got the root cause. It's caused by a recent bug in voc_annotation.py which may incorrectly involve VOC difficult object in annotation file. Such object should not be considered in PascalVOC mAP eval. I've commit fix for that. You can refresh the code and re-generate the 2007_test.txt to verify. Many thanks for the info.
  2. Maybe you can refer this blog post https://zhuanlan.zhihu.com/p/63780820. Generally for mAP score comparison, low conf_threshold is allowed to be used.
johnjunjun7 commented 4 years ago

After modification, it really works. I used the training set generated by the previous voc_annotation.py to train. Will there be problems? Maybe I need to retrain?

david8862 commented 4 years ago

After modification, it really works. I used the training set generated by the previous voc_annotation.py to train. Will there be problems? Maybe I need to retrain?

For training dataset, either include or not include difficult object are both ok. Maybe you can try both and check which can get better result.

david8862 commented 4 years ago

After modification, it really works. I used the training set generated by the previous voc_annotation.py to train. Will there be problems? Maybe I need to retrain?

For training dataset, either include or not include difficult object are both ok. Maybe you can try both and check which can get better result.

A quick update: I've tried to train the YOLO Nano model with Imagenet pretrained backbone, and currently have got 64.95 mAP. Training is still going on.

johnjunjun7 commented 4 years ago

After modification, it really works. I used the training set generated by the previous voc_annotation.py to train. Will there be problems? Maybe I need to retrain?

For training dataset, either include or not include difficult object are both ok. Maybe you can try both and check which can get better result.

A quick update: I've tried to train the YOLO Nano model with Imagenet pretrained backbone, and currently have got 64.95 mAP. Training is still going on.

I got results similar to yours, mAP=67.8% with the conf_threshlod=0.001。 I am trying to retrain using data without difficult :)

david8862 commented 4 years ago

After modification, it really works. I used the training set generated by the previous voc_annotation.py to train. Will there be problems? Maybe I need to retrain?

For training dataset, either include or not include difficult object are both ok. Maybe you can try both and check which can get better result.

A quick update: I've tried to train the YOLO Nano model with Imagenet pretrained backbone, and currently have got 64.95 mAP. Training is still going on.

I got results similar to yours, mAP=67.8% with the conf_threshlod=0.001。 I am trying to retrain using data without difficult :)

Awesome! Can you provide the trained model & Imagenet pretrained backbone? I can publish them in next release.

johnjunjun7 commented 4 years ago

After modification, it really works. I used the training set generated by the previous voc_annotation.py to train. Will there be problems? Maybe I need to retrain?

For training dataset, either include or not include difficult object are both ok. Maybe you can try both and check which can get better result.

A quick update: I've tried to train the YOLO Nano model with Imagenet pretrained backbone, and currently have got 64.95 mAP. Training is still going on.

I got results similar to yours, mAP=67.8% with the conf_threshlod=0.001。 I am trying to retrain using data without difficult :)

Awesome! Can you provide the trained model & Imagenet pretrained backbone? I can publish them in next release.

can you give me your email, I send them to you

david8862 commented 4 years ago

After modification, it really works. I used the training set generated by the previous voc_annotation.py to train. Will there be problems? Maybe I need to retrain?

For training dataset, either include or not include difficult object are both ok. Maybe you can try both and check which can get better result.

A quick update: I've tried to train the YOLO Nano model with Imagenet pretrained backbone, and currently have got 64.95 mAP. Training is still going on.

I got results similar to yours, mAP=67.8% with the conf_threshlod=0.001。 I am trying to retrain using data without difficult :)

Awesome! Can you provide the trained model & Imagenet pretrained backbone? I can publish them in next release.

can you give me your email, I send them to you

david8862@gmail.com. Thanks a lot :)

johnjunjun7 commented 4 years ago

After modification, it really works. I used the training set generated by the previous voc_annotation.py to train. Will there be problems? Maybe I need to retrain?

For training dataset, either include or not include difficult object are both ok. Maybe you can try both and check which can get better result.

A quick update: I've tried to train the YOLO Nano model with Imagenet pretrained backbone, and currently have got 64.95 mAP. Training is still going on.

I got results similar to yours, mAP=67.8% with the conf_threshlod=0.001。 I am trying to retrain using data without difficult :)

Awesome! Can you provide the trained model & Imagenet pretrained backbone? I can publish them in next release.

can you give me your email, I send them to you

david8862@gmail.com. Thanks a lot :)

I am very sorry that the data in my laboratory cannot be copied out now. :( But the results of our current training are similar, and the top5 accuracy of the ImageNet pretraining weights I use is only 74%. If you use weights as accurate as I did, you should get similar result.

There is still some gap between the accuracy we get now and that in the paper, and considering that the model in the paper is quantified, we should be able to get higher accuracy without quantifying. (I see that the test module in your code USES the method of calculating mAP of VOC2012, which is different from VOC2007. Will it also have some influence?)

david8862 commented 4 years ago

After modification, it really works. I used the training set generated by the previous voc_annotation.py to train. Will there be problems? Maybe I need to retrain?

For training dataset, either include or not include difficult object are both ok. Maybe you can try both and check which can get better result.

A quick update: I've tried to train the YOLO Nano model with Imagenet pretrained backbone, and currently have got 64.95 mAP. Training is still going on.

I got results similar to yours, mAP=67.8% with the conf_threshlod=0.001。 I am trying to retrain using data without difficult :)

Awesome! Can you provide the trained model & Imagenet pretrained backbone? I can publish them in next release.

can you give me your email, I send them to you

david8862@gmail.com. Thanks a lot :)

I am very sorry that the data in my laboratory cannot be copied out now. :( But the results of our current training are similar, and the top5 accuracy of the ImageNet pretraining weights I use is only 74%. If you use weights as accurate as I did, you should get similar result.

There is still some gap between the accuracy we get now and that in the paper, and considering that the model in the paper is quantified, we should be able to get higher accuracy without quantifying. (I see that the test module in your code USES the method of calculating mAP of VOC2012, which is different from VOC2007. Will it also have some influence?)

It doesn't matter :) Actually finally I got a checkpoint with mAP=69.40 in my training, but haven't got a good solution to convert it to tflite UINT8 quantized model due to some OP support issue. So the quantized mAP is not verified yet. I'm still working on it.

VOC12 metric use continuous recall value to check the precision for the precision-recall curve, which is different from 10-points solution in VOC07. You can refer this blog for details. I think nowerdays all the PascalVOC mAP metric should have followed the new standard.