AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.75k stars 7.96k forks source link

Number of classes that can be recognized for each grid? #953

Open ghost opened 6 years ago

ghost commented 6 years ago

I have roughly read the paper of YOLO v1, v2. I have two questions.

Question 1: Does the output image of YOLO v1, v2, v3 only recognize one object per grid?

Question 2: Input Image Size = [4160, 4160] YOLO Image Size = [416, 416] anchors = 10, 10

In the example above, is the rectangular area always square? I heard that the anchors are aspect ratios.

AlexeyAB commented 6 years ago
  1. Each anchor in each cell can detect one object. Also Yolo v3 can detect multilabeled object, for example, dog, sofa, tv in one box, it dog, sofa and tv are placed close to each other.

  2. Anchors are inital sizes of objects (size + aspect ratio). Initial size will square, but output bounded boxes (after logistic regression) may not be square.

ghost commented 6 years ago

Thanks reply.

Each anchor in each cell can detect one object. Also Yolo v3 can detect multilabeled object, for >example, dog, sofa, tv in one box, it dog, sofa and tv > are placed close to each other.

So, in predictions.png, only YOLO v3 can draw bounding boxes of several class in one cell??

AlexeyAB commented 6 years ago

So, in predictions.png, only YOLO v3 can draw bounding boxes of several class in one cell??

Yolo v3 just do it much better.

In the new repository, theoretically Yolo v2 can draw bounding boxes of several class in one cell, but it can be very rare due to the use of softmax. Or if is used low -thresh 0.001 for Yolo v2 - look at diningtable, dog: predictions

ghost commented 6 years ago

Thanx very much!