Benjamin-Lee / deep-rules

Ten Quick Tips for Deep Learning in Biology
https://benjamin-lee.github.io/deep-rules/
Other
227 stars 45 forks source link

Don't use deep learning if you don't have to. #3

Open Benjamin-Lee opened 6 years ago

Benjamin-Lee commented 6 years ago

Oftentimes, deep learning is not the best approach. I think one rule (possibly even the first in the article) should be a warning against using DL when it's not needed.

cgreene commented 6 years ago

Perhaps with some helpful tips? If your features are based in deep domain knowledge & high performing + there are few of them, deep learning may not be as helpful. If you have very few examples, deep learning may not be as helpful.

agitter commented 6 years ago

Along with helpful tips, this rule could include how to assess whether deep learning is providing any benefits on your task. Or perhaps benchmarking is important enough to be a separate rule.

ttriche commented 6 years ago

An obvious question is whether the relationship between data and outcomes of interest is nonlinear, and whether previous efforts to apply deep learning to the domain have been successful (e.g. CNNs for recognizing images) or not (most time series work, to the best of my knowledge at least).

sgfin commented 5 years ago

I think that this is an important point that isn't getting reflected in the current set of rules.

As a result, I think that we should take "Rule #7 Use traditional methods to establish performance baselines" and slide it up in position to appear as the second rule. This highlights the fact that we should start with the baselines. They help us get a good feel for the problem and help us figure out if DL is even necessary.

Further, since logistic regression is almost always one of the proper baselines, this also yields best coding practices, since we can implement LR in the deep learning framework as a simple one layer NN, figure out how to optimize that the best we can, and then start adding new layers if we need the lift...and can find out if they even help.

rasbt commented 5 years ago

Totally agree with that. Also support that using logistic regression (and maybe random forest as a reference) would be sufficient as a baseline if we don't want to turn every study into a benchmark project.

We also have to keep in mind that many applications of deep learning in biology are not about deep learning but use it e.g., as one tool in the pipeline. E.g., if your paper is about the discovery of an experimentally validated protein inhibitor and you use(d) deep learning during the prioritization that lead to the said discovery, the point of comparing it to other algorithms becomes moot as you care more about the fact that you discovered the inhibitor rather than the "what if" scenario of using another model. However, logistic regression is almost always a valuable companion for debugging code and implementing DL in a pipeline.