Benjamin-Lee / deep-rules

Ten Quick Tips for Deep Learning in Biology
https://benjamin-lee.github.io/deep-rules/
Other
226 stars 45 forks source link

Correlation is not causality #9

Closed SiminaB closed 5 years ago

SiminaB commented 5 years ago

DL has caveats just like any other model. Just because I feel like this still needs to be said and there more times, the better.

Benjamin-Lee commented 5 years ago

This is definitely a good rule, not just in DL but for science in general. Is there anything DL-specific we can add?

Obligatory xkcd:

agitter commented 5 years ago

If the rule is more broadly about causality, a related thought is that a model being interpretable is distinct from a model being causal. That is still not deep learning-specific but is a frequent point of confusion when discussing deep learning.

SiminaB commented 5 years ago

I meant it more in terms of making it clear that DL does not solve the causality issue. I didn't think it did but I realized that it's still a point of confusion based on at least a couple of questions recently posed at a conference I attended. I do also agree with @agitter's point about interpretation vs causality - that refers more to "is a DL model a black box and to whom is it a black box?" (was also a question at a recent conference.) Not sure if those should be within the same question or different questions.

khyu commented 5 years ago

Agreed. Also, it may be worth noting that DL (or ML in general) can facilitate "causal inference" analysis, which aims to infer causal relations under a set of assumptions.

agitter commented 5 years ago

@SiminaB maybe interpretation vs. causality should be a different rule and discussion issue. Your original point is important enough that it could stand on its own.

@khyu exactly, and it may be worthwhile to point out that sometimes when one requests an "interpretable" model they are really seeking a causal model. A rule could explicitly reference causal models in ML.

SiminaB commented 5 years ago

To me, these issues are also closely related to #12 and #13, in the sense of "what can you do with the data you have? what can a DL approach [or any ML approach] actually give you?" Especially since more than likely DL will be applied in secondary data analyses with data collected for other purposes (or passively) and this will happen increasingly often given that the proliferation of data.

Benjamin-Lee commented 5 years ago

Maybe then we can merge these into one rule along the lines of "Understand the questions you can (and can't) answer with your data and model"

SiminaB commented 5 years ago

That could be a good idea! Just have to try to make the rules somewhat comparable in scope (or maybe we don't have to?)

chevrm commented 5 years ago

Agreed. A nice can vs can't ask section will be very useful, especially for beginners

rasbt commented 5 years ago

Fixed in #117