Benjamin-Lee / deep-rules

Ten Quick Tips for Deep Learning in Biology
https://benjamin-lee.github.io/deep-rules/
Other
227 stars 45 forks source link

Understand, protect against, and benchmark possible overfitting #28

Open chevrm opened 5 years ago

chevrm commented 5 years ago

Have you checked the list of proposed rules to see if the rule has already been proposed?

Feel free to elaborate, rant, and/or ramble.

Any citations for the rule? (peer-reviewed literature preferred but not required)

evancofer commented 5 years ago

It is probably worth mentioning dropout (discussed here) and weight decay, both of which are extremely good for limiting overfitting.

rasbt commented 5 years ago

Just throwing in some references ... a general one regarding easily picking up spurious correlations (or rather systematic noise) is

(There were some other ones similar to this that I can't recall top of my head.)

Regarding correlation of training set size and performance in a bio application:

although they found that this was equally true for traditional ML. So maybe we need to add a general DL paper to highlight to be careful when deciding upon traditional vs DL when training sets are small.

A bio-related one

where they found that naive Bayes was better for noisy datasets

agitter commented 5 years ago

There is a lot to like in that Mayr et al. 2018 paper. They were very thorough in exploring different models, hyperparameters, and chemical featurizations. However, they only consider ROC as an evaluation metric, and most of the ChEMBL targets are highly class-imbalanced.