aladdinpersson / Machine-Learning-Collection

A resource for learning about Machine learning & Deep Learning
https://www.youtube.com/c/AladdinPersson
MIT License
7.69k stars 2.7k forks source link

YOLO ground truth width and length are not relative to image size but to S #140

Open oonisim opened 1 year ago

oonisim commented 1 year ago

Code

dataset.py calculate thewidth_cell and height_cell to be set to the label_matrix Tensor.

"""
...
Then to find the width relative to the cell is simply:
width_pixels/cell_pixels, simplification leads to the
formulas below.
"""
width_cell, height_cell = (
    width * self.S,
    height * self.S,
)

Question

Please help understand why the unit of width_cell and width_cell are cells, that is, relative to S instead of image size.

In my understanding, width andheight are from the YOLO Darknet annotation where width and height are relative to the image size whose value is between 0 and 1. Suppose width=0.7, then width_cell will be 4.9 cells.

If width_cell and width_cell are used as the ground truth for YOLO v1 training, I suppose they should be relative to image size as in the YOLO v1 paper.

Each bounding box consists of 5 predictions: x, y, w, h, and confidence. The (x; y) coordinates represent the center of the box relative to the bounds of the grid cell. The width and height are predicted relative to the whole image.