Closed rasbt closed 5 years ago
A corollary of this rule is that DL is compute-intensive. Be prepared to train many models when starting a project.
That's a nice way to put it! Basically, we want to highlight that when using DL (as opposed to e.g. a deterministic KNN algorithm, decision tree, SVM, etc.), the hyperparameter tuning is exponentially/way more expensive, because
We might want to add something about effective but simple hyperparameters/architecture modifications to experiment with (e.g. dropout, batch normalization). It may also be worth mentioning that adaptive learning methods (e.g. Adam) can help save time when determining a viable model architecture (i.e. before even optimizing hyperparameters).
Sure :) Maybe also adding skip connections to the list
Everything mentioned seems covered between (Tip 5 WIP) #134 and Tip 3 (#124)
Have you checked the list of proposed rules to see if the rule has already been proposed?
In general, getting DL models to work on even simple, structured datasets requires much extensive hyperparameter tuning (Koutsoukas et al. 2017) [Could also add a rule saying that a 2-layer multi-layer perceptron is not deep learning ;)] compared to "traditional" machine learning. Hence, it is important to be patient and try many different hyperparameter settings and combinations.
For instance, the previously "best" learning rate might become useless if we add another layer or change the activation function. Hence, extensive tuning and a near-exhaustive search is recommended.
Furthermore, it should be stressed that even the same architecture and hyperparameter configuration should be tested multiple times with different random seeds since random weight initialization may make a difference between convergence and non-convergence. For model selection and evaluation, it is recommended is to evaluate models based on their average performance based on at least a "top 3 out of 5 models for a given hyperparameter setting" comparison.
Any citations for the rule? (peer-reviewed literature preferred but not required)