Have you checked the list of proposed rules to see if the rule has already been proposed?
[x] Yes
Did you add yourself as a contributor by making a pull request if this is your first contribution?
[x] Yes, I added myself or am already a contributor
While we are amazed by the impressive predictive performance of deep learning models, we may have to stay suspicious when deep learning reports significantly higher predictive performance than other interpretable models with explicit constructed features. This is particularly important because people notice that deep learning does not predict through we expect as "semantic" information, but through some superficial statistics that should not be regarded as useful (Ref 1). To reiterate in an optimistic way, deep learning is a little bit too powerful, and what it learns within the scope of training data (and tested within the scope of testing data), may only be related to the superficial patterns of the data set itself, but not actually related to the task (Many more references about this phenonmeon are availbale in vision/NLP tasks, but I guess I should not digress too much). Thus, when you train a model, and test it and get high predictive peformance, it looks OK, but it may backfire (even more than traditional methods) when it is actually applied in industry.
Relations to other proposed rules:
Related to #43 but extends it as the data can be skewed in a very subtle way (through superficial patterns) (e.g. Ref 1), thus one may not be able to easily check. Also related to @rasbt comment within the thread as the bias in superficial way seems to be only related to deep learning due to DL's power. (#43 seems to extend #12)
Related to #26 but serves as a more fundamental ML-oriented discussion and offers many more evidence (in vision/NLP, refs omitted at this moment) about why #26 is important.
Kind of related to #27 and #35 as currently, the best way to avoid this issue is probably a more careful design of experiments.
Any citations for the rule? (peer-reviewed literature preferred but not required)
Have you checked the list of proposed rules to see if the rule has already been proposed?
Did you add yourself as a contributor by making a pull request if this is your first contribution?
While we are amazed by the impressive predictive performance of deep learning models, we may have to stay suspicious when deep learning reports significantly higher predictive performance than other interpretable models with explicit constructed features. This is particularly important because people notice that deep learning does not predict through we expect as "semantic" information, but through some superficial statistics that should not be regarded as useful (Ref 1). To reiterate in an optimistic way, deep learning is a little bit too powerful, and what it learns within the scope of training data (and tested within the scope of testing data), may only be related to the superficial patterns of the data set itself, but not actually related to the task (Many more references about this phenonmeon are availbale in vision/NLP tasks, but I guess I should not digress too much). Thus, when you train a model, and test it and get high predictive peformance, it looks OK, but it may backfire (even more than traditional methods) when it is actually applied in industry.
Relations to other proposed rules:
Any citations for the rule? (peer-reviewed literature preferred but not required)