HikariTJU / LD

Localization Distillation for Object Detection (CVPR 2022, TPAMI 2023)
Apache License 2.0
354 stars 51 forks source link

About Adaptive Layer #76

Open minhhotboy9x opened 9 months ago

minhhotboy9x commented 9 months ago

I have some questions about adaptive layers when training KD.

  1. When you combined your KD method with other intermediate feature map KD methods, you had to use adaptive layers to upscale student feature maps. I wonder if these adaptive layers were trained with students, or if you just froze them? I've read a lot of papers and nothing written about this.
  2. These adaptive layers may sometimes distort the output feature map from student and also, they don't contribute to the inference process of student. So why do adaptive layers make KD training work effectively? I think they would make the mAP decrease. Can you explain to me, please? Thank you very much. image
HikariTJU commented 9 months ago

Adaptive layer is used when student feature map and teacher feature map doesn't match. Many KD papers use FPN as learning target, and FPN layer mostly have the same feature map, thus no adaptive layer (Including ours). That's why we don't mention it

minhhotboy9x commented 9 months ago

Oh, I see. In my work, I have to use adaptive layers because the number of channels between student and teacher doesn't equal, and I think that makes the mAP of student drop slightly.