Why is the loss divided by (nb_coord_box + 1e-6) / 2?

experiencor / keras-yolo2

Easy training on custom dataset. Various backends (MobileNet and SqueezeNet) supported. A YOLO demo to detect raccoon run entirely in brower is accessible at https://git.io/vF7vI (not on Windows).

MIT License

1.73k stars 785 forks source link

Why is the loss divided by (nb_coord_box + 1e-6) / 2? #349

Closed vinliao closed 5 years ago

vinliao commented 6 years ago

nb_coord_box is the count of total value in the tensor that has the condition, in this case coord_mask > 0. To put it in other words, it counts how many value that are more than 0 in coord_mask. The coord mask is then added by a small number, and divided by 2 (the addition is to make it a float, I guess). What's the purpose of dividing the loss with this value?

bkanaki commented 6 years ago

That's done to avoid the division by zero, in case the value becomes zero.

vinliao commented 6 years ago

Iirc, there is no division at all in the loss function of the paper. Why is the loss divided by a value? Am I missing something?

bkanaki commented 6 years ago

The function in the paper doesn't note it explicitly, but in general for losses, it is common to 'normalize' by dividing the values by number of data samples so that the loss doesn't blow up by using higher number of input samples.

vinliao commented 6 years ago

Ah, that clears up the confusion I had, thanks. I do still have another question regarding the loss function though, if you don't mind answering. In the loss there is the """adjust prediction""" part where tf.sigmoid and tf.exp is applied to the prediction tensor, I can't seem to understand why this operation takes place, and the purpose of it. IIRC, there is nothing about this operation on the paper too. Am I missing something again?

bkanaki commented 6 years ago

Nice observation.

I am also not really sure why it is done for wh and xy. For conf, taking sigmoid makes sense as it gets you the probability.

Anyone else might have more insight?

vinliao commented 6 years ago

Yes, I would like to have an answer to why sigmoid and exp are applied to pred xy and pred wh respectively. I really hope that @experiencor would give an explanation on this.

bkanaki commented 6 years ago

Well, I guess the loss function matches very much with the following implementation:

https://github.com/allanzelener/YAD2K/blob/a42c760ef868bc115e596b56863dc25624d2e756/yad2k/models/keras_yolo.py#L152

So I guess they may have some more suggestions. If you find out, @vin-liao the please post your understanding here too.

Thanks

vinliao commented 6 years ago

After fiddling it for few days, this is how I think about the sigmoid and exp on prediction, CMIIW. So in the ground truth, the NN must predict the xy positioning and wh prediction size, and in my implementation, I organize the xy and wh ground truth to be a range between 0...1 (not so sure about how this repo organize it).

Predicting the value 0...1 directly is not an easy thing, because it requires the weights to be very precise in order to do that. By applying sigmoid, the value 0...1 is "stretched" from around -6 to 6. So instead of predicting the value between 0...1, the NN can learn to predict a value between -6 and 6, which when applied sigmoid, becomes 0...1.

I'm currently not so sure about the tf.exp part, since I don't use it (I'm using sigmoid for both xy and wh). The link that you sent greatly helps @bkanaki thank you.

bkanaki commented 5 years ago

@vin-liao So you implemented the loss function that is different than provided here? How are your training stats with your implementation? Have you provided the source on your fork? If no, can you?

It will really help understanding the differences.

I want to try different stuff by training from scratch, so am curious if your implementation can be leveraged.

vinliao commented 5 years ago

I don't think there's that much difference between this repo and mine, the concept is still the same.

How are your training stats with your implementation?

I haven't successfully train it yet, right now I have a bad data and the detection isn't good yet.

Have you provided the source on your fork? If no, can you?

I'm not sure what you mean by this, I created my project from scratch, so I didn't fork anyone's project. I learn a lot of things by reading this repo and this repo.

evilc3 commented 5 years ago

Hey u guys seem to know a lot abt the loss function can anyone explain to me why we are calculating IOU 2 times in loss function