PeizeSun / SparseR-CNN

[CVPR2021, PAMI2023] End-to-End Object Detection with Learnable Proposal
MIT License
1.31k stars 187 forks source link

Question about the evaluation #86

Closed FL77N closed 3 years ago

FL77N commented 3 years ago

Hi! I have a problem about the evaluation. I know that we will get the top 100 or 300 boxes in the inference, however, there must be some no-object boxes. I don't clear the process how to only keep the object boxes. The postprocess of boxes I find is:

if results.has("pred_boxes"):
        output_boxes = results.pred_boxes
    elif results.has("proposal_boxes"):
        output_boxes = results.proposal_boxes
    else:
        output_boxes = None
    assert output_boxes is not None, "Predictions must contain boxes!"

    output_boxes.scale(scale_x, scale_y)
    output_boxes.clip(results.image_size)

    results = results[output_boxes.nonempty()]
PeizeSun commented 3 years ago

Hi~ I am not very sure what's the meaning of "only keep the object boxes", could you provide more descriptions?

FL77N commented 3 years ago

Hi~ I am not very sure what's the meaning of "only keep the object boxes", could you provide more descriptions?

Sorry, my faults. The question is that we get the 100/300 boxes in the inference, and maybe the GT only has 20 boxes. I mean from the inference results(100/300 boxes) to the final results(20 boxes) there would be a postprocess? The code I show is the postprocess I could find.Is there any other postprocess after inference? thanks!

PeizeSun commented 3 years ago

No other postprocess, the 100/300 boxes will be direct input to the evaluation code. In fact, this is related to the details of computing AP. For example, the GT only has 20 boxes, if top-20 scoring boxes in those 100/300 boxes are matched to 20 GT boxes, other boxes won't make effect. Please see cocoeval.

FL77N commented 3 years ago

No other postprocess, the 100/300 boxes will be direct input to the evaluation code. In fact, this is related to the details of computing AP. For example, the GT only has 20 boxes, if top-20 scoring boxes in those 100/300 boxes are matched to 20 GT boxes, other boxes won't make effect. Please see cocoeval.

Thanks for your detailed reply!Em~how about the test that we don't know how many boxes would be matched?

PeizeSun commented 3 years ago

I am not very sure what's the meaning of "how many boxes would be matched", the possible maximum number or the exact number?

FL77N commented 3 years ago

I am not very sure what's the meaning of "how many boxes would be matched", the possible maximum number or the exact number?

Em~I mean when we use SparseR-CNN for real application, actually we don't know keep the top-20 boxes(maybe there are 20 boxes).However, model would always give 100/300 boxes. how about this situation?

PeizeSun commented 3 years ago

Sorry, I still can't understand your problem. Can you provide more descriptions or using other detector as an example to help me understand?

FL77N commented 3 years ago

Sorry, I still can't understand your problem. Can you provide more descriptions or using other detector as an example to help me understand?

In the inference, the model will predict 100/300 boxes, but the img only contains 20 instances(eg. there are 20 people).In fact, the 80 boxes are redundant. In the evaluation, the algorithm of eval will match the top-20 of the 100/300 boxes because of having GT, but in the test we don't have GT and model how to know keep the top 20 or how to know that how many boxes it should keep. Sorry, I don't know whether I describe clearly?

iFighting commented 3 years ago

Sorry, I still can't understand your problem. Can you provide more descriptions or using other detector as an example to help me understand?

In the inference, the model will predict 100/300 boxes, but the img only contains 20 instances(eg. there are 20 people).In fact, the 80 boxes are redundant. In the evaluation, the algorithm of eval will match the top-20 of the 100/300 boxes because of having GT, but in the test we don't have GT and model how to know keep the top 20 or how to know that how many boxes it should keep. Sorry, I don't know whether I describe clearly?

because of the hungarian match and attention mechanism, the scores of the remaining boxes that are not matching with the 20 ground-truth instance will be very small, and they will not be seen as valid predictions boxes

FL77N commented 3 years ago

Sorry, I still can't understand your problem. Can you provide more descriptions or using other detector as an example to help me understand?

In the inference, the model will predict 100/300 boxes, but the img only contains 20 instances(eg. there are 20 people).In fact, the 80 boxes are redundant. In the evaluation, the algorithm of eval will match the top-20 of the 100/300 boxes because of having GT, but in the test we don't have GT and model how to know keep the top 20 or how to know that how many boxes it should keep. Sorry, I don't know whether I describe clearly?

because of the hungarian match and attention mechanism, the scores of the remaining boxes that are not matching with the 20 ground-truth instance will be very small, and they will not be seen as valid predictions boxes

Thank you, so I think there would be a threshold for scores in the test process.