Benjamin-Lee / deep-rules

Ten Quick Tips for Deep Learning in Biology
https://benjamin-lee.github.io/deep-rules/
Other
227 stars 45 forks source link

Be patient and pay attention to seemingly trivial hyperparameter settings #42

Closed rasbt closed 5 years ago

rasbt commented 6 years ago

Have you checked the list of proposed rules to see if the rule has already been proposed?

In general, getting DL models to work on even simple, structured datasets requires much extensive hyperparameter tuning (Koutsoukas et al. 2017) [Could also add a rule saying that a 2-layer multi-layer perceptron is not deep learning ;)] compared to "traditional" machine learning. Hence, it is important to be patient and try many different hyperparameter settings and combinations.

For instance, the previously "best" learning rate might become useless if we add another layer or change the activation function. Hence, extensive tuning and a near-exhaustive search is recommended.

Furthermore, it should be stressed that even the same architecture and hyperparameter configuration should be tested multiple times with different random seeds since random weight initialization may make a difference between convergence and non-convergence. For model selection and evaluation, it is recommended is to evaluate models based on their average performance based on at least a "top 3 out of 5 models for a given hyperparameter setting" comparison.

Any citations for the rule? (peer-reviewed literature preferred but not required)

agitter commented 6 years ago

A corollary of this rule is that DL is compute-intensive. Be prepared to train many models when starting a project.

rasbt commented 6 years ago

That's a nice way to put it! Basically, we want to highlight that when using DL (as opposed to e.g. a deterministic KNN algorithm, decision tree, SVM, etc.), the hyperparameter tuning is exponentially/way more expensive, because

evancofer commented 5 years ago

We might want to add something about effective but simple hyperparameters/architecture modifications to experiment with (e.g. dropout, batch normalization). It may also be worth mentioning that adaptive learning methods (e.g. Adam) can help save time when determining a viable model architecture (i.e. before even optimizing hyperparameters).

rasbt commented 5 years ago

Sure :) Maybe also adding skip connections to the list

fmaguire commented 5 years ago

Everything mentioned seems covered between (Tip 5 WIP) #134 and Tip 3 (#124)