HobbitLong / RepDistiller

[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods
BSD 2-Clause "Simplified" License
2.11k stars 389 forks source link

How do you choose the optimal hyper-parameters? #20

Open JinYang88 opened 4 years ago

JinYang88 commented 4 years ago

There are several hyper parameters existing:

  1. teacher model hyper parameters
  2. student model hyper parameters
  3. KD hyper parameters (e.g., balance weight for different losses)
  4. Training hyper parameters (e.g., learning rate)

It is hard to enumerate for every combination, because it may explode. How do you find the best (or suboptimal) hyper parameter?

Thanks!

surprisedong commented 2 years ago

same question, just do not know how to set weight for different loss

ShristiDasBiswas commented 3 months ago

Hi, were you able to figure out a good set of hyperparameters?