Open bkkm78 opened 6 years ago
How are you defining "tight enough"? The red boxes are certainly tighter than the blue, but that's true on both the left and right set of clips.
Since we are talking about inconsistencies in the annotations, it is a matter of directly comparing blue boxes on the left with blue boxes on the right. We are expecting the blue box on the left to be about as tight as the blue box on the right, or vice versa.
In other words, the blue annotation boxes on the left are regarded as "not tight enough" while the blue annotation boxes on the right are tight enough. Or if you decide to make all annnotations be consistent with the ones on the left, we will think of the blue annotation boxes on the right as being "too tight".
The red boxes are used to illustrate how this inconsistency in annotations (blue boxes) causes confusion to the evaluation of our detectors. For example, in the first figure our car detector (red boxes) behaves consistently: it detects and locates the car correctly (to the human eye, at least). But due to the annotation (blue boxes) being inconsistent, the one on the left may be mistakenly labeled as a false positive instead of a correct detection.
In some video sequences, annotated bounding boxes are not tight enough compared to other sequences. Below are some examples. (Blue boxes are from the annotation. Red boxes are from the detector we use.)
This happens for the
Person
class as well.