ayooshkathuria / YOLO_v3_tutorial_from_scratch

Accompanying code for Paperspace tutorial series "How to Implement YOLO v3 Object Detector from Scratch"
https://blog.paperspace.com/how-to-implement-a-yolo-object-detector-in-pytorch/
2.32k stars 724 forks source link

Different anchors #5

Closed moshanATucsd closed 6 years ago

moshanATucsd commented 6 years ago

Hi, thanks so much for the great tutorial! I am wondering whether anyone has tried different number of anchors, other than the three anchors 6,7,8 as in the cfg file? There's a RuntimeError due to this line if different number of anchors is used.

prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size)

I am wondering how to solve this?

Update

Based on https://github.com/pjreddie/darknet/issues/561, the author says

If you use a different number of anchors you have to figure out which layer you want to predict which anchors and the number of filters will depend on that distribution

ayooshkathuria commented 6 years ago

Could you please provide me details as to how many anchors are you trying to predict. Maybe a screenshot of Runtime error details would help.

This line has to do with reshaping the prediction feature map into a better form so that the operations can be vectorised over all bounding boxes. This is bound to get screwed up if you switch to a different number of anchors since the code is written with the assumption you have three anchors per cell, otherwise the reshaping calculations won't work, and it'll throw an error. I've described it in Part 3 of the tutorial.

Why don't you gimme a detailed description, and we can figure something out :).

ayooshkathuria commented 6 years ago

I don't think we can do much with predicting more than three anchors or using different anchors at all without retraining the algorithm. We actually use the official weights file which has weights as such that it works for anchors given in the cfg file.

Secondly, if we were to change the number of anchors, this is basically changing the number of bounding boxes a cell can predict. This is going to change the very depth of the prediction feature map which (B * (5 + C)) where B is the number of anchors or bounding boxes each cell predicts. In that case, we can't load the official weights file, as we have changed the architecture and therefore the number of weights in the network.

I'm working on the training code, though I don't get an awful lot of time owing to my undergraduate thesis up for presentation in may. I can consider making training with different number of anchors an option in the training module though. However, anchors are generated using dimension clustering method, and prolly you will have to run K-means clustering on your ground truth boxes in your dataset to generate a different number of anchors.

Do let me know if you can't grasp any part. I'd link up resources where you can read further about YOLO.

moshanATucsd commented 6 years ago

Hi thanks for the quick reply! The reason behind using more than three anchors is that it seems the default 6,7,8 correspond to the coarsest level and if we want to detect small objects, it may be better to use the anchors like 0,1,2 (correct me if I am wrong). If we want to detect a wide range of objects maybe using more than 3 anchors will help.

Your explanation above is very clear, I can see that for now it's best to use 3 anchors. It's a good idea to run kmeans if we want to use it for a specific dataset. Again, thanks so much for your tutorial!

ayooshkathuria commented 6 years ago

@moshanATucsd If you look at the cfg file, there are three detection layers, which progressively larger sizes. The anchors 0,1,2 are used for detection at the 3rd detection layer (defined by mask) , which is upsampled and concatenated with the layer 36, and it helps detect smaller objects. To give you an idea, this is how the final arch looks like.

image

Here's a blog post I wrote over at medium explaining the changes in YOLO v3. https://towardsdatascience.com/yolo-v3-object-detection-53fb7d3bfe6b.

In this post, I've made a comparison on what different detection layers detect. And, this repo only works for the tutorial. So, I've kept it short, and it won't be updated with training code. The ever evolving code for YOLO v3 lives in my other repo Here. In the readme, you can find a --scales flag to be used with detect.py, and with that you can choose which detection layer you want to use for detections, and perhaps isolate if you wanna see what each layer predicts.

moshanATucsd commented 6 years ago

@ayooshkathuria Thanks for your detailed explanation and the blog post! It makes things more clear to me now. I really appreciate your help!