some questions about the paper

deeplearning-wisc / stud

source code for CVPR'22 paper "Unknown-Aware Object Detection: Learning What You Don’t Know from Videos in the Wild"

Apache License 2.0

117 stars 14 forks source link

Thanks for sharing your great work!

I have read your paper, but I am having the following confusions w.r.t. to your methodology:

When constructing a counter-part, we need an ID and an OOD feature representation, how do you select ID? Is this coming from the energy function lower than a certain threshold?
Why we need to construct a one-one counter-part instead of one-multiple counter parts ?
When the detector network just get initialized, classification branch is still unreliable, is energy function a good representation of OOD?
Why high dissimilarity with an ID indicates OOD? How to certify they are not objects from some other classes? Objects from the rest classes might occupy larger feature space.

Hope to hear your answer!

Hi, @luoyuchenmlcv

Thanks for interest in our work!

1) The Id in the code is normally defined as the proposals after RPN with higher confidence to be foreground.

2) one-to-one mapping is because we aggregate all the potential unknown objects in a different frame to improve the diversity and also the smoothness of the object features. The latter is crucial for optimization.

3) Yeah, at the beginning of training, the energy function might not be optimal. but it does not do too much harm. The network will quickly learn discriminative object representations and thus produce meaningful logits and energy scores.

4) We actually introduce the energy filtering in the distillation part, The dissimilar objects with mild energy scores are used as the unknowns. So that hopefully "objects from some other classes" are filtered. You can visualize to check.

Let me know if you have more questions.

deeplearning-wisc / stud

some questions about the paper #7