If a human can't perform the task with ease, be surprised if a machine can.

paulbrodersen commented 5 years ago

Have you checked the list of proposed rules to see if the rule has already been proposed?

[x] Yes

Feel free to elaborate, rant, and/or ramble.

Alternative heading: Garbage in -- garbage out.

Some people confuse machine learning -- and deep learning is no exception -- with elaborate statistical methods, and belief that neural networks have magical, super-human powers that can detect patterns nobody else can. As a consequence they belief that they can take their data in any form, throw the trending machine learning method of the day at it (which will automagically find a suitable representation of the data and then learn the task), and then profit.

Life rarely works out that way. Humans are actually incredibly good at detecting patterns. More often than not, if a human can't see "it", a machine won't either (c.f. the rule on the importance on baseline performances). If the machine performs better than chance, be suspicious. It is probably overfitting the training data (c.f. the rule on freezing a test set). For success, it is generally tantamount to find a representation of the data based on which the human could perform the task, and then -- and only then -- try to teach a machine to do it.

Any citations for the rule? (peer-reviewed literature preferred but not required)

DOI

hugoaerts commented 5 years ago

In general I agree with this. If humans cant do it, likely the information is not in the data. However, there are also examples of machines that do better job compared to humans. For example, look at the lung screening Kaggle challenge. Benign or malignant lung nodules look very similar to humans, but machines can discriminate them. But I like the rule, but we should note that machine simply can do some things better, but we need appropriate validation/testing of the networks to reduce false positive results.

paulbrodersen commented 5 years ago

Yeah, that exact example motivated the qualifier:

More often than not, [...]

However, the fact that we had the exact same example in mind is maybe telling that there aren't that many: a deafening silence, if you will, only broken by a few whispers (so far). Hence my advice for caution:

If a human can't perform the task with ease, be surprised if a machine can.

agitter commented 5 years ago

More often than not, if a human can't see "it", a machine won't either

What type of data are you thinking of here?

I agree that we should counter the belief that DL will automagically work, but there are many types of high-dimensional biological data where humans cannot easily find patterns and ML/DL can. Successful applications of autoencoders come to mind.

paulbrodersen commented 5 years ago

I have no citations just anecdotes (which I would attribute to classic survivorship bias). For example, I was once approached by a startup to develop them a neural net that would predict tertiary protein structure from sequence data alone. IIIRC, they wanted to tackle membrane proteins in particular. I tried for several hours to convince them that it probably could not be done, period, but to no avail.

I think the larger point that I want to make is that some people make the mistake to think of machine learning, and somehow in particular of deep learning, first and foremost as a statistical method that has magical power (pun intended) to find structure in seemingly random data. The ability to automate difficult to hard code processes comes second. To my mind, machine learning only really shines when it comes to task of the second type, and whenever you see a machine learning algorithm finding structure where none seems to be (judged by a human eye), then you should really, really worry about overfitting before declaring success. Again, as I have said before, the rule is deliberately phrased in such a way as to not rule out the ability of machine learning algorithms to see structure where humans can't, but to caution when they do.

Successful applications of autoencoders come to mind.

I haven't come across those. Do you have some examples (genuinely curious not questioning their existence)?

I vaguely remember some transcriptomics (?) people trying to do something along the lines that you are hinting at. I vaguely remember being not terribly convinced at the time but that may have been for any number of reasons, and I can't find the reference any more.

cgreene commented 5 years ago

DeepSEA, Basset, etc all seem to find patterns that humans wouldn't notice by looking at a lot of sequences.

agitter commented 5 years ago

I find the transcriptomics applications to be more convincing, even if Casey has shown their dependence on parameters.

In biochemistry / drug discovery, this autoencoder is another example of finding some structure in a domain that is unapproachable via manual inspection. It may not be a perfect latent representation, but there is learned structure there.

DeepSEA, Basset, etc all seem to find patterns that humans wouldn't notice by looking at a lot of sequences.

Yes, regulatory genomics is another example where NNs find meaningful patterns that are not detectable by humans. That doesn't mean that we can blindly trust neural networks or shouldn't evaluate them with skepticism. However, The ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge comes to mind as one specific example where careful analysis revealed a lot about context dependence of TF sequence preferences and other real biology.

jmschrei commented 5 years ago

I like this rule, with its qualifiers. The "be surprised" aspect suggests that you should rigorously validate it before claiming success not that you shouldn't ever believe it. I've found the most successful applications of deep learning are not to solving tasks that humans find very difficult to do, but automating tasks that humans find easy but computers have previously had a hard time doing (computer vision, game playing, NLP...). @cgreene I think that, given a reasonable amount of time, humans would notice regulatory patterns. If I recall correctly, it's common practice for regulatory genomics papers to validate their model by showing that they recover known motifs. It's true that many of these known motifs are also discovered by prior tools (e.g., MEME), but some of them have been described through human effort too.

evancofer commented 5 years ago

@jmschrei I would disagree, since the use of these models is not limited to identification of regulatory patterns. For instance, Basenji/ExPecto modeled gene expression from regulatory signals, a task for which there is little reason to expect good human performance. Most of those models seem to outperform conservation-based methods for identifying the effects of regulatory variants (both SNPs and indels), which humans are generally bad at. It may be possible to identify regulatory effects of common variants through assocation-based studies, but these DL methods are the best approach to analyzing rare or de novo variants.

rasbt commented 5 years ago

I agree with you @evancofer in a traditional machine learning setting, but imho @jmschrei is right if we narrow it down to DL (here, I mean CNNs, RNNs etc, not simple multi-layer perceptrons which are more traditional artificial neural networks and "traditional machine learning"). If we say that techniques enable us to identify patterns in high dimensional data that are somewhat hard to identify "manually" because of the sheer amount of data it's usually sth that applies to ML (or even "classic" statistics in general). On the other hand, the lower the larger the hypothesis space, the higher the chance to find a pattern that is actually due to systematic noise and not reflective of the real phenomenon. DL is exceptionally prone to that.

I.e., I remember a paper from Bengio's group ("Measuring the tendency of CNNs to Learn Surface Statistical Regularities" https://arxiv.org/abs/1711.11561) in that regard. I.e., just by adding some systematic noise that would be "ignored" by humans we can substantially change the model behavior. This implies that there is a high risk that we may fit to sth that is not real. Vice versa, if a DL model can predict something that a human has a hard time with, it's usually useful to investigate whether this is real or just an artifact.

jmschrei commented 5 years ago

My comment wasn't to say that you should be surprised if deep networks outperform humans on tasks that humans do well at, but you should be surprised when deep networks perform well on tasks humans can't do. If humans were totally unable to model gene expression using sequence alone and yet a deep model came out that could do it, I'd be very skeptical. However, many of these models validate their predictions by showing that when making known mutations to TF binding sites you get an expected change in gene expression. This is because humans are capable of identifying associations, but certainly not as quickly as one could train a deep model.

evancofer commented 5 years ago

@jmschrei I agree with that sentiment about rigorous statistical evaluation. I would also go a step further and say that claims of novel biological mechanisms and phenomena should require experimental demonstration, but this is more of a general ML/data science rule than a DL-specific one.

Benjamin-Lee / deep-rules

If a human can't perform the task with ease, be surprised if a machine can. #26