AntonMu / TrainYourOwnYOLO

Train a state-of-the-art yolov3 object detector from scratch!
Other
646 stars 412 forks source link

What's the purpose of pre-trained weights in YOLO? #232

Open anjanaouseph opened 3 years ago

anjanaouseph commented 3 years ago

"Before getting started download the pre-trained YOLOv3 weights and convert them to the keras format", I want to understand why do we need to use pre-trained weights in yolo.

AntonMu commented 3 years ago

Hi @Voldy1998

On a high level, the yolo network is a deep neural net with millions of parameters and needs millions of labeled images to tune all these parameters.

Because many people don't have the resources to label that amount of images we use a technique called transfer learning where we use the knowledge that the network has acquired by training on millions of similar images. That is why we need the pre-trained weights. And we can get good results with only a few hundred images.

Hope that helps. Google also has a lot of resources about yolo and transfer learning in general.

anjanaouseph commented 3 years ago

Hi @AntonMu , Thanks for the quick reply!

anjanaouseph commented 3 years ago

I had plotted a graph for the Precision vs Recall curve for a class car and found the area under the graph (Average Precision) as shown below. car

I got the Average Precision to be 88.25 %. Do you find anything wrong with the graph?

AntonMu commented 3 years ago

You probably need to be a bit more specific. What is your IoU constraint here? A common metric in object detection is mAP.

anjanaouseph commented 3 years ago

@AntonMu iou is 0.5 here.

AntonMu commented 3 years ago

I see. I think it looks fine. It is a little unusual to use this type of curve. How do you deal with multiple cars in one picture? In your graph do you vary the threshold to say car/no car?

anjanaouseph commented 3 years ago

The trained model was tested on a test dataset of 120 images of cars. The 120 images had 272 instances of class 'car', out of which 220 were detected as 'True Positive', 26 as 'False Positive', and 26 as "False Negative".

For each class (here there is just one class car) the neural network detection results were sorted by decreasing confidence scores and are assigned to ground-truth objects. It was judged to be true or false positives by measuring bounding box overlap. To be considered a correct detection, the area of overlap between the predicted bounding box and ground truth bounding box must exceed 50%. Detections output were assigned to ground truth object annotations satisfying the overlap criterion in order ranked by the (decreasing) confidence output. Object detected and matched to ground-truth is considered as True Positive (TP), Object detected but did not match ground-truth is False Positive (FP). Ground truth objects with no matching detection are False Negatives (FN). In the case of multiple detections of the same object, only one is set as a correct detection and the repeated ones are set as false detections.

Using the above criteria the precision-recall curve was plotted. Then a version of the measured precision/recall curve with precision monotonically decreasing was plotted by setting the precision for recall 'r' to the maximum precision obtained for any recall r' > r. Finally, we compute the Average Precision (AP) as the area under the precision-recall curve (shown in light blue) in the above figure.

AntonMu commented 3 years ago

Ok. That sounds reasonable for your use case. You are basically treating a object detection problem as a classification problem. Essentially you are capturing one aspects of the model (the classification) with your metric but ignore the aspect of detecting objects accurately. Hope that makes sense.

anjanaouseph commented 3 years ago

Hi @AntonMu, I actually followed your steps and trained the detector to detect cars and draw bounding boxes around it in an image which is classification + localization. Now the detection results for each and every car in an image (the coordinates of the bounding box and the confidence scores) were saved in a text file. The ground-truth coordinates of each and every car in an image were also saved into a text file. Now to evaluate the performance of YOLO I tried to find the Average Precision/Mean Average Precision by following this https://github.com/Cartucho/mAP which was suggested by you https://github.com/AntonMu/TrainYourOwnYOLO/issues/63.

AntonMu commented 3 years ago

Ah Ok - cool. Yes, you are right. I was a bit thrown off by the single class. But it makes sense - if you only have a single class there is no mean to calculate. If you feel comfortable feel free to add a pull request to get that feature added to the repo. Thank you so much!

anjanaouseph commented 3 years ago

Yes, @AntonMu I hope the graph is not incorrect or anything. How can I add this as a feature? I went to that repo and followed the guidelines and ran his code.

AntonMu commented 3 years ago

Cool - yes. I think there are several options. One could be to add a description on the steps you did maybe under the 3_Inference section.

There would also be an option to add some code that would handle the computation. Basically you describe what one needs to do to calculate mAP and provide a script that does it.

anjanaouseph commented 3 years ago

Yes @AntonMu, sure I will add. Issue is I am not 100% sure if what I did is correct.

AntonMu commented 3 years ago

I see - best thing is to start a PR and then I can check. But if you followed the tutorial it should be fine. To add it here - I would like to also work for multiple classes.

anjanaouseph commented 3 years ago

Okay @AntonMu, Thanks Will do it!

Pei648783116 commented 3 years ago

Hi Anjanaouseph, do you mind uploading your code for converting our csv to the required file format for mAp calculation?

anjanaouseph commented 3 years ago

Hi @648783116

convert .csv as per https://github.com/Cartucho/mAP

from csv import DictReader

INPUT_FILE = 'Detection_Results.csv'

with open(INPUT_FILE, 'rt') as csvfile: reader = DictReader(csvfile) for row in reader: filename = "{}.txt".format(row["image"]) if row["label"] and row["confidence"] and row["xmin"] and row["ymin"] and row["xmax"] and row["ymax"]: # if this field is not empty line = row["label"]+" "+row["confidence"]+" "+row["xmin"]+" "+row["ymin"]+" "+row["xmax"]+" "+row["ymax"] else: print("Both 'Taaloefening2' and 'Taaloefening2' empty on {}{}. Skipping.".format(row["id"], row["Label"])) continue with open(file_name, 'a') as output: output.write(line+"\n")

The above is the code that I used.