Closed JohnWnx closed 5 years ago
@i-chaochen I am about red-section (1st section):
But still, why you need to calculate the deltas for non-object? Those deltas seem can be calculated at object sections as well? It’s “nonobject” anyway during the training.
There are two parts of code in the Red-section:
non-objects
, and it calculates only objectness(T0)_delta - we should calculate it to decrease objectness(T0), to say that there is no object: https://github.com/AlexeyAB/darknet/blob/cce34712f6928495f1fbc5d69332162fc23491b9/src/yolo_layer.c#L253-L258objects
, we use it only if we want that many final activations will produce the same one bounded box - which will be fused by NMS: https://github.com/AlexeyAB/darknet/blob/cce34712f6928495f1fbc5d69332162fc23491b9/src/yolo_layer.c#L259-L268Thanks for this explanation! Helps a lot!
So it means green part is for NMS for detected objects?
@i-chaochen
truth_thresh=1
in yolov3.cfg so it is disabled by default) : https://github.com/AlexeyAB/darknet/blob/cce34712f6928495f1fbc5d69332162fc23491b9/src/yolo_layer.c#L259-L268Thanks a lot, your explanation for red section is really crystal clear!
Sorry for my dumbness for asking this again, since you already have calculated the deltas for nonobject and object at red section, why you still need the green screen for single object detection again? How does this green section for single object detection helps on the results of bounding boxes?
@i-chaochen
There are two cases:
If truth_thresh=1
then this part of Red-section will not be used (delta_yolo_class and delta_yolo_box will not be calculated) - so we must use Green-section
If truth_thresh<1
then this part of Red-section will be used delta_yolo_class and delta_yolo_box will be calculated), but if for some predictions best_iou < l.truth_thresh
then this code still will not calculate delta_yolo_class and delta_yolo_box - so we must use Green-section
@AlexeyAB I see.... Again! Thanks for the kind explanation!
Hi @AlexeyAB , I am studying about the YOLOv4 paper, and I see a new idea in paper is "use multiple anchors for a single ground truth if IoU(gt, anchor) > iou threshold," can you explain more about this part? How it works and what is the motivation of this idea?
I saw your explanation about the figure
Red section - search for final activations where are no objects Green section - search for final activations where are objects
but you still count object loss in no object region(in red region)? thanks.
Hi may I know what needs to be changed for training with 4-point coordinates labels, rather than xywh?
I have been trying to edit the current version of YOLO to train labels containing such format: x1,y1,x2,y2,x3,y3,x4,y4 rather than the current xywh format.
1) what does index & entry_index() in yolo_layer.c do? I understand that the values "i & j" are used in this function wheref "i" is related to truth.x while "j" is related to truth.y. In this case of x1-x4 and y1-y4, will i need j1-4 and i1-4?
2) Replacing instances of (4+1) with (8+1) in yolo_layer.c I have replaced instances of " int class_id = state.truth[t(4 + 1) + bl.truths + 4];" with: "int class_id = state.truth[t(8 + 1) + bl.truths + 8];" I have replaced 4 with 8, as there are 8 parameters (excluding class id) for each bounding box instead of the original 4 (xywh). I have also performed the changes on: box truth = float_to_box_stride(state.truth + t(8 + 1) + bl.truths, 1); //UPDATED
Would I also need to replace 4 to 8 for the following functions? static int entry_index(layer l, int batch, int location, int entry) { int n = location / (l.wl.h); int loc = location % (l.wl.h); return batchl.outputs + nl.w*l.h(4+l.classes+1) + entryl.wl.h + loc; } I have also tried changing the following line: //l.outputs = hwn(classes + 4 + 1); to l.outputs = hwn*(classes + 8 + 1);
However, I receive the following error when attempting to run: "Error: l.outputs == params.inputs filters= in the [convolutional]-layer doesn't correspond to classes= or mask= in [yolo]-layer "
3) Is this the correct method to predict the coordinates for the 4 coordinates of the bounding boxes? (I don't see the connection between the prediction equations in figure 2 of the yolov3 paper being related to the calculations performed in get_yolo_box() or delta_yolo_box(). )
in get_yolo_box(): of yolo_layer.c I'm no longer using this: b.w = exp(x[index + 2stride]) biases[2n] / w; Instead, I predict the 8 values of the 4 coordinates (except of my code is shown below: ie. b.x1 = (i + x[index + 0stride]) / lw; stored in x --> x[] b.y1 = (j + x[index + 1stride]) / lh; stored in x --> x[] b.x2 = (i + x[index + 2stride]) / lw; stored in x --> x[] b.y2 = (j + x[index + 3stride]) / lh;
Also in delta_yolo_box() of yolo_layer.c: i'm no longer using this: float tw = log(truth.ww / biases[2n]); Instead, I predict the 8 values of the 4 coordinates (except of my code is shown below): float tx1 = (truth.x1lw - i); float ty1 = (truth.y1lh - j); float tx2 = (truth.x2lw - i); float ty2 = (truth.y2lh - j);
Thank you.
*Thus far, I have made changes to mainly data.c (handling the reading of new label format), yolo_layer.c (for predictions) and box.c (for computation of IOU).