NESCent / r-popgen-hackathon

Population Genetics Hackathon, to be held at NESCent on March 16-20, 2015
18 stars 2 forks source link

Project/Package Idea: Multivariate outliers in genome scans #9

Open DrK-Lo opened 9 years ago

DrK-Lo commented 9 years ago

I have some code that can be used to identify outliers in multivariate space. I could see this as a way to combine results from multiple test statistics in genome scans (i.e. FST, genetic-environment association, genotype-phenotype association) to get multivariate outliers, rather than relying on outliers from individual statistics (note that I also think there are some caveats to this approach). I can see this feasibly developed into a package during the hackathon, and would love to try it on some simulated or real datasets. If anyone thinks this is a good idea, let me know.

darencard commented 9 years ago

I would certainly be interested in something like this and haven't really seen anything that attempts it. It would be nice to have some way of summarizing the various genetic statistics (Fst, Tajima's D, etc.) we use in sliding-window analyses (usually you'll see numerous plots for each one, which grows fast with Fst between multiple populations).

zkamvar commented 9 years ago

Is there a link to the description of the method?

thibautjombart commented 9 years ago

Cool stuff. I have had request for exactly that kind of thing on the adegenet forum. You're most welcome to join the adegenet development team and put this there. Also happy to help if you go for a separate package - ensuring data class compatibility etc.

Cheers Thibaut

On Mon, Mar 9, 2015 at 1:35 AM, Zhian N. Kamvar notifications@github.com wrote:

Is there a link to the description of the method?

— Reply to this email directly or view it on GitHub https://github.com/NESCent/r-popgen-hackathon/issues/9#issuecomment-77788193 .

DrK-Lo commented 9 years ago

I just wrote a blog post outlining two approaches to identify multivariate outliers.

Thibaut - I am an intermediate R user and really have no idea about data classes. I'm all about compatibility though!

thibautjombart commented 9 years ago

Katie - I like the material you put on the blog! Very happy to discuss stuff with you next week. If you have any toy dataset with some outliers, that will probably be super useful.

On Mon, Mar 9, 2015 at 1:06 PM, Katie Lotterhos notifications@github.com wrote:

I just wrote a blog post outlining two approaches to identify multivariate outliers https://sites.google.com/site/katielotterhos/opennotebooks/k-lo/multivariateoutliersingenomescans .

Thibaut - I am an intermediate R user and really have no idea about data classes. I'm all about compatibility though!

— Reply to this email directly or view it on GitHub https://github.com/NESCent/r-popgen-hackathon/issues/9#issuecomment-77849420 .

DrK-Lo commented 9 years ago

Thanks, I do have some published simulations that I tested them on at one point (Lotterhos and Whitlock 2014, 2015).
Would like to try it on some other simulations or real data if people are interested.

smanel commented 9 years ago

I can provide real data to test it (already on the wiki) or in collaboration. Also happy to dicuss more.