Open chevrm opened 5 years ago
It is probably worth mentioning dropout (discussed here) and weight decay, both of which are extremely good for limiting overfitting.
Just throwing in some references ... a general one regarding easily picking up spurious correlations (or rather systematic noise) is
(There were some other ones similar to this that I can't recall top of my head.)
Regarding correlation of training set size and performance in a bio application:
although they found that this was equally true for traditional ML. So maybe we need to add a general DL paper to highlight to be careful when deciding upon traditional vs DL when training sets are small.
A bio-related one
where they found that naive Bayes was better for noisy datasets
There is a lot to like in that Mayr et al. 2018 paper. They were very thorough in exploring different models, hyperparameters, and chemical featurizations. However, they only consider ROC as an evaluation metric, and most of the ChEMBL targets are highly class-imbalanced.
Have you checked the list of proposed rules to see if the rule has already been proposed?
Feel free to elaborate, rant, and/or ramble.
Any citations for the rule? (peer-reviewed literature preferred but not required)