BichenWuUCB / squeezeDet

A tensorflow implementation for SqueezeDet, a convolutional neural network for object detection.
BSD 2-Clause "Simplified" License
739 stars 306 forks source link

About box matching #53

Open ByeonghakYim opened 7 years ago

ByeonghakYim commented 7 years ago

Hello.

I wonder if you have some experience on the issue of box matching.

Most CNN based object detection have box matching strategies to assign prior(anchor) box to ground truth during the training.

In many famous detection model like RCNN, SSD, they use the strategy that prior(anchor) boxes match the ground truth box when they have over 0.5 IOU. But in this case, if there are no prior(anchor) box which over 0.5 IOU for any ground turth, that ground truth always to be a negative. This will be very big problem. To solve this they need a lot of prior(anchor) boxes and this causes the decline in processing speed.

And I found that your squeezeDet matches boxes with highest IOU, so that each ground truth can have its anchor box and never assigned as negative.

My question is if you have tried upper method. if yes, what was the difference between that method and yours?

I have tried your method but the loss is not converge after 3 when upper method gets near 1 in the end of the training. And it miss a lot(about 40%). I think the reason is that the offset for localization is so big when the IOU is too small.

My english is not very good so if you don't understand the question then please tell me.

BichenWuUCB commented 7 years ago

@ByeonghakYim Thanks for your question. I'm also curious to see how the IOU-thresholding based box matching works for you. Could you try to explain a bit more on what you mean by

I have tried your method but the loss is not converge after 3 when upper method gets near 1 in the end of the training. And it miss a lot(about 40%).

Thanks.

bhyim516 commented 7 years ago

I followed your method that described in your paper "a ground truth bounding box. During training, we com- pare ground truth bounding boxes with all anchors and as- sign them to the anchors that have the largest overlap (IOU) with each of them. The reason is that we want to select the “closest” anchor to match the ground truth box such that the transformation needed is reduced to minimum. Iijk evalu- ates to 1 if the k-th anchor at position-(i, j) has the largest overlap with a ground truth box, and to 0 if no ground truth is assigned to it. This way, we only include the loss gener- ated by the “responsible” anchors. As there can be multiple objects per image, we normalize the loss by dividing it by the number of objects." so that each ground truth has one anchor. but I could find that some of them has under 0.1 top IOU and this leads big loss convergence(minimum loss is near 3 which is bigger than other box matching method that loss is near 1). And I also found that your anchors from KITTI bbox distribution, but I think they are good for KITTI, not general case. I assigned the anchors as (30,30), (20,40), (17,50), (40,20), (17,50) (60,60), (40,80), (35,100), (80, 40), (100, 35) (120,120)... (200,200)... (300,300)... , for more general case.

Thanks.

andreapiso commented 7 years ago

Anchor boxes are dataset dependent. You should run k-means on your dataset boxes beforehand to know which ones to use.

On Fri, 9 Jun 2017 at 8:46 AM, ByoenghakYim notifications@github.com wrote:

I followed your method that described in your paper "a ground truth bounding box. During training, we com- pare ground truth bounding boxes with all anchors and as- sign them to the anchors that have the largest overlap (IOU) with each of them. The reason is that we want to select the “closest” anchor to match the ground truth box such that the transformation needed is reduced to minimum. Iijk evalu- ates to 1 if the k-th anchor at position-(i, j) has the largest overlap with a ground truth box, and to 0 if no ground truth is assigned to it. This way, we only include the loss gener- ated by the “responsible” anchors. As there can be multiple objects per image, we normalize the loss by dividing it by the number of objects." so that each ground truth has one anchor. but I could find that some of them has under 0.1 top IOU and this leads big loss convergence(minimum loss is near 3 which is bigger than other box matching method that loss is near 1). And I also found that your anchors from KITTI bbox distribution, but I think they are good for KITTI, not general case. I assigned the anchors as (30,30), (20,40), (17,50), (40,20), (17,50) (60,60), (40,80), (35,100), (80, 40), (100, 35) (120,120)... (200,200)... (300,300)... , for more general case.

Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/BichenWuUCB/squeezeDet/issues/53#issuecomment-307265162, or mute the thread https://github.com/notifications/unsubscribe-auth/AN_wJoUrsKEMfcwD0Ui-yUOJUEAtNTVtks5sCJXSgaJpZM4Nz-Lm .

bhyim516 commented 7 years ago

@AndreaPisoni Yes, it will perform well for KITTI, but it is not good for in general case. I think anchor boxes should be simple with a lot of data and I'm considering more general case. Thanks for your comment

BichenWuUCB commented 7 years ago

@ByeonghakYim

so that each ground truth has one anchor. but I could find that some of them has under 0.1 top IOU

I wonder what is the reason why the top matched anchor only has an IOU of 0.1 with the ground truth. I can think of the following reasons:

In your case, does it fall into any of the above situation?

ByeonghakYim commented 7 years ago

@BichenWuUCB Thanks, there was some mistakes and I solved that problem. But I've got another question. There can be one anchor box has multiple ground truth matching. In this case, how do you propagate the loss to that anchor box during the backpropagation?

BichenWuUCB commented 7 years ago

An anchor is not going to be matched with multiple ground truth boxes. At this line and below, you can see how this is handled.

ByeonghakYim commented 7 years ago

@BichenWuUCB Thanks. This part should be prevention of the issue.

if ov_idx not in aidx_set: aidx_set.add(ov_idx) aidx = ov_idx if mc.DEBUG_MODE: max_iou = max(overlaps[ov_idx], max_iou) min_iou = min(overlaps[ov_idx], min_iou) avg_ious += overlaps[ov_idx] num_objects += 1 break

I have one more question. I'm sorry for many question. I think if there are many overlapped ground truth, some of them cannot optimal anchor matching and it will lead to very big localization offset. I could find you use minimum distance to match boxes if there are only 0 IOU and I think this also will lead same problem. I was just wondering if it is not a problem. You have much more experience on object detection than me and I would like to know your opinion. Thanks.

bayesian-mind commented 4 years ago

@BichenWuUCB Thanks, there was some mistakes and I solved that problem. But I've got another question. There can be one anchor box has multiple ground truth matching. In this case, how do you propagate the loss to that anchor box during the backpropagation?

@ByeonghakYim How did you end up solving your issue?

peek1999 commented 4 years ago

@BichenWuUCB Thanks, there was some mistakes and I solved that problem. But I've got another question. There can be one anchor box has multiple ground truth matching. In this case, how do you propagate the loss to that anchor box during the backpropagation?

@ByeonghakYim How did you end up solving your issue?

one anchor box is not supposed to match with multiple ground truth images in an image. Since one anchor box corresponds to only one model prediction, it should be matched with only one ground truth box. You can either match it to anyone out of the possible matches or use the ground truth box that has the highest IoU.