Closed Guocode closed 3 years ago
Good question! There may be many reasons for this:
I didn't understand 1.3), multi-teacher will feed more dataset to the baseline model beyond either isolated dataset, theoretically it should perform better than either baseline model. I would like to try to explain that crowd_human dataset covers a wider domain than wider face or coco car, so I guess that a wider domain task will benefit a narrow one but hurt itself if put them together. So we still need to carefully merge mutli datasets with different labels before we find a method can definitively promote both.
What you said might be one of the reasons. Domain Has a great effect.
“theoretically it should perform better than either baseline model”: this is right when both KD model and baseline are one-class detector, but i think a multi-class KD model might be worse than a single-class baseline in some specific datasets.
In the multi teacher KD experinment resdcn18_KD_woGT_scratch always performs a little worse than resdcn18 on crowd dataset, even if pretrained on imagenet, and it outperforms on another dataset, how does it happen?