ArchipLab-LinfengZhang / Object-Detection-Knowledge-Distillation-ICLR2021

The official implementation of ICLR2021 paper "Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors".
MIT License
57 stars 6 forks source link

Question for the weighting parameter #8

Open nlkim0817 opened 3 years ago

nlkim0817 commented 3 years ago

First of all, thank you for sharing your valuable code. As the following provided code, I want to know the meaning of the multiplier 6 after each weighting parameters (7e-5 and 4e5). I could not find any details about it in the paper.

                kd_feat_loss += dist2(t_feats[_i], self.adaptation_layers[_i](x[_i]), attention_mask=sum_attention_mask,
                                      channel_attention_mask=c_sum_attention_mask) * 7e-5 * 6
                kd_channel_loss += torch.dist(torch.mean(t_feats[_i], [2, 3]),
                                              self.channel_wise_adaptation[_i](torch.mean(x[_i], [2, 3]))) * 4e-3 * 6