AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.7k stars 7.96k forks source link

what does index & entry_index() in yolo_layer.c do? #1532

Closed JohnWnx closed 5 years ago

JohnWnx commented 6 years ago

Hi may I know what needs to be changed for training with 4-point coordinates labels, rather than xywh?

I have been trying to edit the current version of YOLO to train labels containing such format: x1,y1,x2,y2,x3,y3,x4,y4 rather than the current xywh format.

1) what does index & entry_index() in yolo_layer.c do? I understand that the values "i & j" are used in this function wheref "i" is related to truth.x while "j" is related to truth.y. In this case of x1-x4 and y1-y4, will i need j1-4 and i1-4?

2) Replacing instances of (4+1) with (8+1) in yolo_layer.c I have replaced instances of " int class_id = state.truth[t(4 + 1) + bl.truths + 4];" with: "int class_id = state.truth[t(8 + 1) + bl.truths + 8];" I have replaced 4 with 8, as there are 8 parameters (excluding class id) for each bounding box instead of the original 4 (xywh). I have also performed the changes on: box truth = float_to_box_stride(state.truth + t(8 + 1) + bl.truths, 1); //UPDATED

Would I also need to replace 4 to 8 for the following functions? static int entry_index(layer l, int batch, int location, int entry) { int n = location / (l.wl.h); int loc = location % (l.wl.h); return batchl.outputs + nl.w*l.h(4+l.classes+1) + entryl.wl.h + loc; } I have also tried changing the following line: //l.outputs = hwn(classes + 4 + 1); to l.outputs = hwn*(classes + 8 + 1);

However, I receive the following error when attempting to run: "Error: l.outputs == params.inputs filters= in the [convolutional]-layer doesn't correspond to classes= or mask= in [yolo]-layer "

3) Is this the correct method to predict the coordinates for the 4 coordinates of the bounding boxes? (I don't see the connection between the prediction equations in figure 2 of the yolov3 paper being related to the calculations performed in get_yolo_box() or delta_yolo_box(). )

image

in get_yolo_box(): of yolo_layer.c I'm no longer using this: b.w = exp(x[index + 2stride]) biases[2n] / w; Instead, I predict the 8 values of the 4 coordinates (except of my code is shown below: ie. b.x1 = (i + x[index + 0stride]) / lw; stored in x --> x[] b.y1 = (j + x[index + 1stride]) / lh; stored in x --> x[] b.x2 = (i + x[index + 2stride]) / lw; stored in x --> x[] b.y2 = (j + x[index + 3stride]) / lh;

Also in delta_yolo_box() of yolo_layer.c: i'm no longer using this: float tw = log(truth.ww / biases[2n]); Instead, I predict the 8 values of the 4 coordinates (except of my code is shown below): float tx1 = (truth.x1lw - i); float ty1 = (truth.y1lh - j); float tx2 = (truth.x2lw - i); float ty2 = (truth.y2lh - j);

delta[index + 0*stride] = scale * (tx1 - x[index + 0*stride]); 
delta[index + 1*stride] = scale * (ty1 - x[index + 1*stride]);
delta[index + 2*stride] = scale * (tx2 - x[index + 2*stride]); 
delta[index + 3*stride] = scale * (ty2 - x[index + 3*stride]);

Thank you.

*Thus far, I have made changes to mainly data.c (handling the reading of new label format), yolo_layer.c (for predictions) and box.c (for computation of IOU).

AlexeyAB commented 5 years ago

@i-chaochen I am about red-section (1st section): image

But still, why you need to calculate the deltas for non-object? Those deltas seem can be calculated at object sections as well? It’s “nonobject” anyway during the training.

There are two parts of code in the Red-section:

  1. Only this part of red-section is related to non-objects, and it calculates only objectness(T0)_delta - we should calculate it to decrease objectness(T0), to say that there is no object: https://github.com/AlexeyAB/darknet/blob/cce34712f6928495f1fbc5d69332162fc23491b9/src/yolo_layer.c#L253-L258
  2. This part of red-section is related to objects, we use it only if we want that many final activations will produce the same one bounded box - which will be fused by NMS: https://github.com/AlexeyAB/darknet/blob/cce34712f6928495f1fbc5d69332162fc23491b9/src/yolo_layer.c#L259-L268
i-chaochen commented 5 years ago

Thanks for this explanation! Helps a lot!

So it means green part is for NMS for detected objects?

AlexeyAB commented 5 years ago

@i-chaochen

i-chaochen commented 5 years ago

Thanks a lot, your explanation for red section is really crystal clear!

Sorry for my dumbness for asking this again, since you already have calculated the deltas for nonobject and object at red section, why you still need the green screen for single object detection again? How does this green section for single object detection helps on the results of bounding boxes?

AlexeyAB commented 5 years ago

@i-chaochen

There are two cases:

  1. If truth_thresh=1 then this part of Red-section will not be used (delta_yolo_class and delta_yolo_box will not be calculated) - so we must use Green-section

  2. If truth_thresh<1 then this part of Red-section will be used delta_yolo_class and delta_yolo_box will be calculated), but if for some predictions best_iou < l.truth_thresh then this code still will not calculate delta_yolo_class and delta_yolo_box - so we must use Green-section

i-chaochen commented 5 years ago

@AlexeyAB I see.... Again! Thanks for the kind explanation!

vb123er951 commented 4 years ago

Hi @AlexeyAB , I am studying about the YOLOv4 paper, and I see a new idea in paper is "use multiple anchors for a single ground truth if IoU(gt, anchor) > iou threshold," can you explain more about this part? How it works and what is the motivation of this idea?

I saw your explanation about the figure image

Red section - search for final activations where are no objects Green section - search for final activations where are objects

but you still count object loss in no object region(in red region)? thanks.