Closed ALPHAYA-Japan closed 4 years ago
can you show me the code to display the boxes as well as an example image where the neg boxes were wrong?
I cant paste the whole code here as it is quite long. But it is easy to tell you how will you reproduce the issue. I modified the call() function of SSDLosses to return pos_idx, neg_idx as well. Since the loss function is being used inside the train_step() procedure, I also modified the return to get pos_idx, neg_idx.
During the training loop: the train_step() procedure will return the neg & pos indices.
You can then use the default_boxes, gt_regs to draw the boxes and see.
You can try this image as an example: VOC2012/JPEGImages/2008_007056.jpg
Hmm, I think there is some misunderstanding here. Let me clarify something first. To decide whether a box is positive or negative, we must first compute the classification loss. There are two possible ways that a box is considered a negative one:
Ground-truth box label is non-background, anchor box label is background => false negative
Ground-truth box label is background, anchor box label is non-background => false positive
And there is only one possible way that a box is a positive one:
But don't forget the following case, since it will generate a small classification loss as well (i.e. not considered as a negative box)
Basically, a negative box is not necessarily any box that is not a ground-truth box, it depends on the output confidence that was outputted by the model. In your case, seems like you're getting the negative boxes from the first pattern, i.e. boxes that failed to be labeled as an object.
To confirm this, can you print out the confs
and gt_confs
at pos_idx
, neg_idx
as well?
Thank you for your nice clarification. Probably the most confusing part in your code is the hard_negative_mining() algorithm. Getting pos_idx from gt_confs is clear. However getting the neg_idx seems confusing in the following snippet:
rank = tf.argsort(loss, axis=1, direction='DESCENDING') # Line 1: This is clear
rank = tf.argsort(rank, axis=1) # Line 2: Unclear
neg_idx = rank < tf.expand_dims(num_neg, 1) # Line 3: Get the top num_neg
you applied the argsort on the loss in descending order. it will return the indices of the loss along the axis = 1. We can get the top num_neg from line 1, why did you re-apply the argsort on rank again in Line 2?
It's easier to understand when looking at the example. Imagine we have the following loss array (unrealistic numbers but easier to the eyes):
[100, 50, 47, 200, 33, 16, 45, 350, 10]
Suppose we want 3 negative boxes. Eventually, the neg_idx
we want is [True, False, False, True, False, False, False, True, False]
(we have True at the indices of 350, 200, 100)
How do you get that? If you run the first line:
rank = tf.argsort(loss, axis=1, direction='DESCENDING')
rank
will become [7, 3, 0, 1, 2, 6, 4, 5, 8]
. It's not what we want, right?
Next, if we run the next line:
rank = tf.argsort(rank, axis=1)
rank
will now be [2, 3, 4, 1, 6, 7, 5, 0, 8]
. It's tricky to comprehend, but basically, by getting the index of the result of argsort
, we will now have an array that tells us how big an element is. For example, the 7th element rank[7] = 0
, which means that the 7th element of the loss array (loss[7]
) is the biggest. Similarly, loss[3]
is the second biggest because rank[3] = 1
and so on.
We're getting close. By comparing the rank
array to the number of negative boxes, we can achieve the neg_idx
array that we wanted.
rank < 3 = [True, False, False, True, False, False, False, True, False]
Hope that helps clear your confusion.
Thanks a lot @ChunML , that clarified it a lot.
I was trying to understand how you coded the hard_negative_mining() function, Actually while training I managed to draw the pos boxes as well as the neg boxes, but guess what, it seems that the algorithm doesn't retrieve the neg boxes correctly, when you draw them, you actually see that they are almost the same rectangle. Probably they have 1~2 pixels different,
Please correct me if I am mistaken