JDAI-CV / centerX

This repo is implemented based on detectron2 and centernet
Apache License 2.0
555 stars 86 forks source link

multi teacher KD cannot achieve better performance than seperate models #2

Closed Guocode closed 3 years ago

Guocode commented 3 years ago

In the multi teacher KD experinment resdcn18_KD_woGT_scratch always performs a little worse than resdcn18 on crowd dataset, even if pretrained on imagenet, and it outperforms on another dataset, how does it happen?

CPFLAME commented 3 years ago

Good question! There may be many reasons for this:

  1. The imbalance instances of different datasets: 1).crowd human has 352978 human instance,and the other has 1/3 or 1/8 of it. 2). the crowd human model is well trained and the other model is not trained well for lack of annotations. 3). So multi-teacher KD helps model to training more dataset, and the lack of training dataset well increase mAP. 4). Theoretically,a multi-class model is worse than one-class model, so the well trained crowd human model is worse than baseline.
  2. My super parameters is not the best one.
Guocode commented 3 years ago

I didn't understand 1.3), multi-teacher will feed more dataset to the baseline model beyond either isolated dataset, theoretically it should perform better than either baseline model. I would like to try to explain that crowd_human dataset covers a wider domain than wider face or coco car, so I guess that a wider domain task will benefit a narrow one but hurt itself if put them together. So we still need to carefully merge mutli datasets with different labels before we find a method can definitively promote both.

CPFLAME commented 3 years ago

What you said might be one of the reasons. Domain Has a great effect.

“theoretically it should perform better than either baseline model”: this is right when both KD model and baseline are one-class detector, but i think a multi-class KD model might be worse than a single-class baseline in some specific datasets.