Multiple comparisons problems

I'm still working my way through the paper published by @gwaygenomics, @allaway and @cgreene, but it made me think of an issue that I believe we should try to deal with in our final product. In the paper they had a specific hypothesis that they tested; however, we are going to provide people with the ability to test out hypotheses on thousands of different mutations.

There are some problems with this ability, such as non-response bias. There are bound to be many uninteresting results (AUROC = 0.5) for different genes that people will tend to glance over. I can very easily imagine a scenario where someone iterates through many different genes until they reach one where a model does a good job at predicting a mutation.

We could approach this issue in a few different ways:

hold out some data for validation -- only to be used for publication
apply some sort of correction (e.g. Bonferroni)
place strong emphasis on effect sizes
list a clear disclaimer

I wanted to open this issue up so we can discuss the importance of the problem and possible solutions.

cognoma / machine-learning

Multiple comparisons problems #83