GzyAftermath / CAT-KD

CVPR 2023, Class Attention Transfer Based Knowledge Distillation
32 stars 3 forks source link

Beta Values #2

Open m-parchami opened 1 year ago

m-parchami commented 1 year ago

Hi,

Thanks for sharing the code and great work!

I have a question regarding beta values (coefficient for CAT loss). In the README it's stated that they are not optimized and one should tune them. However, in the configs I'm finding very different and seemingly precise Betas. E.g. in CAT_KD dir: wrn_40_2_wrn_40_1.yaml: 1.5 wrn_40_2_wrn_16_2.yaml: 12 resnet32x4_shuv2.yaml: 600 ...

Could you please clarify whether these are optimized on some data or they are derived from a function of settings?

Would be great if you could also mention what data, in case of former :)

Thanks a lot! Great work!

GzyAftermath commented 1 year ago

Hi, we have optimized the value of beta carefully in CAT-KD experiments since we need to compare the performance with the previous works. The README statement indicates that the value of beta is not tuned in CAT experiments, since they are just used for the ablation.