allanzelener / YAD2K

YAD2K: Yet Another Darknet 2 Keras
2.71k stars 877 forks source link

Why do you have to apply log to box_wh? #112

Open voqtuyen opened 6 years ago

voqtuyen commented 6 years ago

I see that in the yolo_head calculation, the width and height are calculated as:

    box_wh = K.exp(feats[..., 2:4])

    # Adjust preditions to each spatial grid point and anchor size.
    # Note: YOLO iterates over height index before width index.
    box_wh = box_wh * anchors_tensor / conv_dims

Because of log, width and height become negative. But here in prepocess_true_boxes, you apply a different formula to true_boxes? Can you explain the reason why?

voqtuyen commented 6 years ago

@allanzelener , @shadySource

RRdmlearning commented 6 years ago

Do u get the answer of your question? I have the same question with u!

tianyu-tristan commented 6 years ago

@voqtuyen @RRdmlearning I may have an answer, but I'm still confused on the loss implementation. For your question, let's assume the model raw output (batch, 13, 13, B, 4+1+C), where the 4 coordinates information is (tx, ty, tw, th). The purpose of "yolo_head" is to convert (tx, ty, tw, th) into absolute (bx, by, bw, bh) referring to YOLOv2 paper. However, the "prepocess_true_boxes" on the contrary tries to convert (bx, by, bw, bh) into (tx, ty, tw, th) to compute a square loss. If you notice carefully, those are exactly reverse calculations. I think the reason of doing this is in "yolo_loss" sum of square is computed, and they need to be comparable. indicates the square loss is calculated in the domain of [sigmoid(tx), sigmoid(ty), tw, th]. They seems to be technically comparable by doing that reverse calculation, but I don't know why...

What I also don't understand is, YOLOv2 doesn't seem to mention a new version of loss function, and referring to YOLOv1 the sum of square loss on (w,h) is working on square root domain, but the implementation here is not...

RRdmlearning commented 6 years ago

@tianyu-tristan Thanks for your tips,I check the code of darkflow and yolo2-pytorch, and I found that it is unnecessary to use (tx, ty, tw, th) to calculte the loss.

About the square root domain, the dartflow use it, but yad2k not...

You can cheak it yourself. I am also not sure.

voqtuyen commented 6 years ago

@RRdmlearning, Yolo paper refers to using square root for width/height for stability loss optimization, but it seems here the author did not use it.

tianyu-tristan commented 6 years ago

@RRdmlearning @voqtuyen Thanks for sharing. I did change the loss function according to YOLOv1 paper, not much difference though, but still worth to try as it suppose to be more stable. Here's what I did: (1) change to np.sqrt(box[2]/conv_width) and np.sqrt(box[3]/conv_height) (2) change to pred_boxes = K.concatenate((K.sigmoid(feats[..., 0:2]), K.sqrt(pred_wh)), axis=-1)