ChunML / ssd-tf2

A super clean implementation of SSD (Single Shot MultiBox Detector) made possible by Tensorflow 2.0
MIT License
115 stars 54 forks source link

Negative Boxes seem to be wrong #19

Closed ALPHAYA-Japan closed 4 years ago

ALPHAYA-Japan commented 4 years ago

I was trying to understand how you coded the hard_negative_mining() function, Actually while training I managed to draw the pos boxes as well as the neg boxes, but guess what, it seems that the algorithm doesn't retrieve the neg boxes correctly, when you draw them, you actually see that they are almost the same rectangle. Probably they have 1~2 pixels different,
Please correct me if I am mistaken

ChunML commented 4 years ago

can you show me the code to display the boxes as well as an example image where the neg boxes were wrong?

ALPHAYA-Japan commented 4 years ago

I cant paste the whole code here as it is quite long. But it is easy to tell you how will you reproduce the issue. I modified the call() function of SSDLosses to return pos_idx, neg_idx as well. Since the loss function is being used inside the train_step() procedure, I also modified the return to get pos_idx, neg_idx.

During the training loop: the train_step() procedure will return the neg & pos indices.
You can then use the default_boxes, gt_regs to draw the boxes and see.

You can try this image as an example: VOC2012/JPEGImages/2008_007056.jpg

ChunML commented 4 years ago

Hmm, I think there is some misunderstanding here. Let me clarify something first. To decide whether a box is positive or negative, we must first compute the classification loss. There are two possible ways that a box is considered a negative one:

  1. Ground-truth box label is non-background, anchor box label is background => false negative

  2. Ground-truth box label is background, anchor box label is non-background => false positive

And there is only one possible way that a box is a positive one:

But don't forget the following case, since it will generate a small classification loss as well (i.e. not considered as a negative box)

Basically, a negative box is not necessarily any box that is not a ground-truth box, it depends on the output confidence that was outputted by the model. In your case, seems like you're getting the negative boxes from the first pattern, i.e. boxes that failed to be labeled as an object.

To confirm this, can you print out the confs and gt_confs at pos_idx, neg_idx as well?

ALPHAYA-Japan commented 4 years ago

Thank you for your nice clarification. Probably the most confusing part in your code is the hard_negative_mining() algorithm. Getting pos_idx from gt_confs is clear. However getting the neg_idx seems confusing in the following snippet:

    rank    = tf.argsort(loss, axis=1, direction='DESCENDING') # Line 1: This is clear
    rank    = tf.argsort(rank, axis=1)                         # Line 2: Unclear
    neg_idx = rank < tf.expand_dims(num_neg, 1)                # Line 3: Get the top num_neg

you applied the argsort on the loss in descending order. it will return the indices of the loss along the axis = 1. We can get the top num_neg from line 1, why did you re-apply the argsort on rank again in Line 2?

ChunML commented 4 years ago

It's easier to understand when looking at the example. Imagine we have the following loss array (unrealistic numbers but easier to the eyes): [100, 50, 47, 200, 33, 16, 45, 350, 10]

Suppose we want 3 negative boxes. Eventually, the neg_idx we want is [True, False, False, True, False, False, False, True, False] (we have True at the indices of 350, 200, 100)

How do you get that? If you run the first line: rank = tf.argsort(loss, axis=1, direction='DESCENDING')

rank will become [7, 3, 0, 1, 2, 6, 4, 5, 8]. It's not what we want, right?

Next, if we run the next line: rank = tf.argsort(rank, axis=1)

rank will now be [2, 3, 4, 1, 6, 7, 5, 0, 8]. It's tricky to comprehend, but basically, by getting the index of the result of argsort, we will now have an array that tells us how big an element is. For example, the 7th element rank[7] = 0, which means that the 7th element of the loss array (loss[7]) is the biggest. Similarly, loss[3]is the second biggest because rank[3] = 1 and so on.

We're getting close. By comparing the rank array to the number of negative boxes, we can achieve the neg_idx array that we wanted.

rank < 3 = [True, False, False, True, False, False, False, True, False]

Hope that helps clear your confusion.

ALPHAYA-Japan commented 4 years ago

Thanks a lot @ChunML , that clarified it a lot.