Cartucho / mAP

mean Average Precision - This code evaluates the performance of your neural net for object recognition.
Apache License 2.0
2.9k stars 909 forks source link

Threshold for detection #54

Open Ownmarc opened 5 years ago

Ownmarc commented 5 years ago

Hello,

I'm training yolov2 on 248 of my own classes using Darkflow.

I found your great repo to evaluate my training (thanks alot)!

I am using a set of 100k images to train and a set of 1k images to evaluate my model while training (training on one gpu and evaluating at the same time on an other gpu). My goal is to find the sweet spot to decrease my learning rate and continue with the training.

I have been trying a few things out and here is what I got:

step 47 750, detection threshold of 0.5 gives me mAP of 18.95 step 47 750, detection threshold of 0.2 gives me mAP of 39.68 step 47 750, detection threshold of 0.1 gives me mAP of 44.71 step 47 750, detection threshold of 0.01 gives me mAP of 46.34

step 52 500, detection threshold of 0.5 gives me mAP of 31.73 step 52 500, detection threshold of 0.2 gives me mAP of 52.56 step 52 500, detection threshold of 0.1 gives me mAP of 56.18 step 52 500, detection threshold of 0.01 gives me mAP of 57.42

My conclusion here is that the model is still learning and I should keep going.

My question, what detection threshold should I be using ? Does it make a difference if my goal is only to evaluate one step to an other ? Should I keep calculating many of them like this ?

Thanks

Cartucho commented 5 years ago

Hello,

Firstly, I would be using a larger validation/test set (you are currently using 1%, I would be using at least 10%). And make sure that the test and training set are completely independent and the pictures have been taken in different days/circumstances.

Secondly, since the mAP is a ranked metric usually when you decrease the threshold you will get a larger mAP score but that comes with the cost of having more False Positives (object detected but does not match ground-truth). Of course if your threshold is too high then you get more False Negatives (object not detected but it is in the ground-truth). So that really depends if you want to maximize Precision (What proportion of positive identifications was actually correct?) or Recall (What proportion of actual positives was identified correctly?).

I am currently not plotting the False Negatives but I will add that feature so that people can visualise this tradeoff.

Also, have a look at the log-average miss ratio score (you should see a plot in the results/ folder), many people use it.