Benjamin-Lee / deep-rules

Ten Quick Tips for Deep Learning in Biology
https://benjamin-lee.github.io/deep-rules/
Other
226 stars 46 forks source link

Be suspicious: high prediction accuracy may backfire #55

Open HaohanWang opened 5 years ago

HaohanWang commented 5 years ago

Have you checked the list of proposed rules to see if the rule has already been proposed?

Did you add yourself as a contributor by making a pull request if this is your first contribution?

While we are amazed by the impressive predictive performance of deep learning models, we may have to stay suspicious when deep learning reports significantly higher predictive performance than other interpretable models with explicit constructed features. This is particularly important because people notice that deep learning does not predict through we expect as "semantic" information, but through some superficial statistics that should not be regarded as useful (Ref 1). To reiterate in an optimistic way, deep learning is a little bit too powerful, and what it learns within the scope of training data (and tested within the scope of testing data), may only be related to the superficial patterns of the data set itself, but not actually related to the task (Many more references about this phenonmeon are availbale in vision/NLP tasks, but I guess I should not digress too much). Thus, when you train a model, and test it and get high predictive peformance, it looks OK, but it may backfire (even more than traditional methods) when it is actually applied in industry.

Relations to other proposed rules:

Any citations for the rule? (peer-reviewed literature preferred but not required)

blengerich commented 5 years ago

A reference for this issue in the medical context is: https://arxiv.org/abs/1807.00431