dbolya / tide

A General Toolbox for Identifying Object Detection Errors
https://dbolya.github.io/tide
MIT License
702 stars 115 forks source link

BoxError and ClassError can match with used detections ? #7

Closed 0phoff closed 3 years ago

0phoff commented 3 years ago

Hey,

Thanks for this project, it seems to be a really useful tool to provide understandable inside into the performance of your model.

However, when looking at your code, I noticed that you use gt_cls_iou and gt_noncls_iou when matching IoU for BoxErrors and ClassErrors respectively. It is my understanding that these IoU's are the base IoU without removing the IoU from matched annotations, as those would be gt_unused_cls and gt_unused_noncls respectively.
Wouldn't this mean that you potentially assign a FP detection as a BoxError, but in fact the annotation for which it has a wrong localisation is already matched by another (TP) detection ? Shouldn't that detection thus become a BackgroundError, as there already is a TP detection for that annotation, but it is not localised well enough to become a DuplicateError ? The same goes for ClassErrors, though here it cannot be a DuplicateError because of the wrong class, and thus can only be a BackgroundError.

Let me know your thoughts about this.

dbolya commented 3 years ago

Hi, thanks for looking into this.

However, I do believe this is an issue of semantics. We intended background error to mean that there was no valid ground truth to match with there. This would indicate to a model designer that their detector is "hallucinating" objects that aren't there (or perhaps there are objects that weren't annotated).

Thus, if we included classification and localization false positives in that category, the meaning would change (in fact, background error would change generally to "false positive"), which is not what we wanted out of that category (special errors have a general FP type for that reason).

It's true that fixing these "duplicate" classification and localization false positives would result in a duplicate detection, which is not itself a classification or localization error, but I'd argue that marking them as duplicate detections would be going to far. Right now, they're trying to match to a ground truth but are the wrong class or have too low of an IoU (but not low enough to be marked as background). Right now we have that marked down as either classification or localization error, and we "fix" those errors by simply ignoring that detection (i.e., the same "fix" as for duplicate detections).

If you fixed the classification error and it now became a duplicate, I don't think that should be marked as duplicate error, since you just fixed a classification error to get to this point. On the other hand, background error consists of only those detections <= 0.1 IoU with any ground truth. Adding in more false positives that have higher IoUs simply because another detection already matched to that GT would skew the meaning of background error.

TL;DR: Considering the detection independently, it was a classification / localization error. If you wanted to address the issues with your model, that's the bin you'd want it to be classified as. Once you've addressed those issues, you might now have to address duplicate error, or NMS might just take care of it anyway. Depends on the model.

0phoff commented 3 years ago

Thanks for your thorough explanation. I understand that this issue is a case of semantics and understand the reasons behind your decisions now!
You are totally right that my approach turns Background Errors more into a general FP category, which is not really that useful.

With this explanation, I also finally managed to recreate your metric in my own analysis framework, so thanks a lot!