HobbitLong / RepDistiller

[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods
BSD 2-Clause "Simplified" License
2.11k stars 389 forks source link

Why using log_softmax instead of softmax? #52

Open nguyenvulong opened 2 years ago

nguyenvulong commented 2 years ago

I think it should be softmax instead. Otherwise p_t and p_s are not comparable.

Could you please explain why? https://github.com/HobbitLong/RepDistiller/blob/dcc043277f2820efafd679ffb82b8e8195b7e222/distiller_zoo/KD.py#L13-L17

nguyenvulong commented 2 years ago

for those who have similar question like i did: https://github.com/yoshitomo-matsubara/torchdistill/issues/233