choderalab / assaytools

Modeling and Bayesian analysis of fluorescence and absorbance assays.
http://assaytools.readthedocs.org
GNU Lesser General Public License v2.1
18 stars 11 forks source link

Test out Gaussian Process for dealing with outliers #52

Open sonyahanson opened 8 years ago

sonyahanson commented 8 years ago

Thanks to Patrick and Bas for chatting about this over coffee after my lab meeting. Sounds like Lee had a very promising answer: Gaussian Processes!

Here are some potentially useful links I found by googling 'gaussian process outliers python': https://bugra.github.io/work/notes/2014-05-11/robust-regression-and-outlier-detection-via-gaussian-processes/ https://ocefpaf.github.io/python4oceanographers/blog/2015/03/16/outlier_detection/

jchodera commented 8 years ago

I don't think this is what we want. GPs are great for data in which there is a natural spatial relationship between the collected data, but that relationship must be learned. We are dealing with a very different case---we know what the relationship is, through the dissociation constant equations and mass conservation laws. Utilizing a GP of the sort in those examples would not only "forget" that information, but it doesn't allow us to propagate any uncertainty in which points are outliers into the posterior.

jchodera commented 8 years ago

Instead, I think we should use an approach like this, where there is a prior on the fraction of outliers and the outlier distribution has a mean and variance that is inferred (and marginalized out) during MCMC sampling: http://www.astroml.org/book_figures/chapter8/fig_outlier_rejection.html

jchodera commented 8 years ago

But first, before we even talk about models, we absolutely need to collect some examples of the outliers and look at them to see what it tells us about the nature of the data.

sonyahanson commented 8 years ago

Just making a note here that this is something we should keep at the front of our minds.

jchodera commented 8 years ago

Agreed! Would be great to compile a list of data with outliers to find a strategy that works!

sonyahanson commented 8 years ago

Here is an example of two almost identical datasets with and without an outlier:

With outlier: https://github.com/choderalab/fluorescence-assay-manuscript/blob/fig_sketches/analysis/bayes/DMSO-backfill/delG_Bosutinib-AB-2016-07-31%2020:10.png https://github.com/choderalab/fluorescence-assay-manuscript/blob/fig_sketches/analysis/bayes/DMSO-backfill/Bosutinib-AB-2016-07-31%2020:10.json

Without outlier: https://github.com/choderalab/fluorescence-assay-manuscript/blob/fig_sketches/analysis/bayes/DMSO-backfill/delG_Bosutinib-IJ-2016-07-31%2020:13.png https://github.com/choderalab/fluorescence-assay-manuscript/blob/fig_sketches/analysis/bayes/DMSO-backfill/Bosutinib-IJ-2016-07-31%2020:13.json

jchodera commented 8 years ago

Awesome! This is exactly what we need to make this work! Thanks!

sonyahanson commented 7 years ago

@jchodera has an idea about Bayesian outlier detection that he is interested in implementing.