TIDE output interpretation

jinmingteo commented 3 years ago

hi @dbolya,

i was testing out TIDE with 2 of my models (with slight different augmentations between them). The results are:

Model 1

 mask AP @ 50: 50.43

                         Main Errors
=============================================================
  Type      Cls      Loc     Both     Dupe      Bkg     Miss  
-------------------------------------------------------------
   dAP     5.05     5.61     0.21     0.00     3.73    14.52  
=============================================================

        Special Error
=============================
  Type   FalsePos   FalseNeg  
-----------------------------
   dAP       8.64      28.71  
=============================

Model 2

mask AP @ 50: 45.71

                         Main Errors
=============================================================
  Type      Cls      Loc     Both     Dupe      Bkg     Miss  
-------------------------------------------------------------
   dAP     5.09     3.76     0.05     0.00     3.54    14.56  
=============================================================

        Special Error
=============================
  Type   FalsePos   FalseNeg  
-----------------------------
   dAP       8.75      25.02  
=============================

I am a little confused that the dAP (except Miss) Model 2 (with 45.71 AP) are significantly lower than Model 1 (with 50.43 AP).. Is there a good intuition or interpretation of the aforementioned results? I would think Model 1 is better (given its mAP) but TIDE seems to suggest otherwise.

dbolya commented 3 years ago

Yeah, this output seems odd to me. TIDE doesn't really work for very small changes because what affects AP is fairly complicated, but that change seems to have caused a large change of AP.

I guess the intuition that you can pull from this is that the change didn't actually affect any one category of error specifically, and just generally made the network better. If your change didn't target one particular subset of the error categories, then the overall AP is more meaningful.

jinmingteo commented 3 years ago

thanks @dbolya! Will use overall AP as a first cut then TIDE main errors.

dbolya / tide

TIDE output interpretation #27