WisonZ / GKD

Code for GKD
8 stars 0 forks source link

Should I get the k by sorted student's logits or teacher's logits? #3

Open Geek-lixiang opened 1 year ago

Geek-lixiang commented 1 year ago

Should I get the k by sorted student's logits or teacher's logits? there are two loss in your code.

WisonZ commented 1 year ago

We provide codes for both situations in gkd_adaptive.py. In our experiments, we found that getting k from student is better.