How do you choose the optimal hyper-parameters?

HobbitLong / RepDistiller

[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods

BSD 2-Clause "Simplified" License

2.11k stars 389 forks source link

Open JinYang88 opened 4 years ago

JinYang88 commented 4 years ago

There are several hyper parameters existing:

It is hard to enumerate for every combination, because it may explode. How do you find the best (or suboptimal) hyper parameter?

Thanks!

surprisedong commented 2 years ago

same question, just do not know how to set weight for different loss

ShristiDasBiswas commented 3 months ago

Hi, were you able to figure out a good set of hyperparameters?