Open Geek-lixiang opened 1 year ago
Should I get the k by sorted student's logits or teacher's logits? there are two loss in your code.
We provide codes for both situations in gkd_adaptive.py. In our experiments, we found that getting k from student is better.
Should I get the k by sorted student's logits or teacher's logits? there are two loss in your code.