dvlab-research / ReviewKD

Distilling Knowledge via Knowledge Review, CVPR 2021
255 stars 34 forks source link

Will KL-Divergence loss further improve the performance? #16

Open LiuDongyang6 opened 2 years ago

LiuDongyang6 commented 2 years ago

Thank you for the nice work! I wonder if you have tried to use ReviewKD loss and KL-divergence loss together? Will the combination further improve the performance? If yes, would you like to share the results or the hyperparameters?

akuxcw commented 1 year ago

Sorry for the late reply. I didn't try KL loss. In my opinion, KL loss is good at handling 1d logits instead of large 2d-features.