Open DrK-Lo opened 9 years ago
I would certainly be interested in something like this and haven't really seen anything that attempts it. It would be nice to have some way of summarizing the various genetic statistics (Fst, Tajima's D, etc.) we use in sliding-window analyses (usually you'll see numerous plots for each one, which grows fast with Fst between multiple populations).
Is there a link to the description of the method?
Cool stuff. I have had request for exactly that kind of thing on the adegenet forum. You're most welcome to join the adegenet development team and put this there. Also happy to help if you go for a separate package - ensuring data class compatibility etc.
Cheers Thibaut
On Mon, Mar 9, 2015 at 1:35 AM, Zhian N. Kamvar notifications@github.com wrote:
Is there a link to the description of the method?
— Reply to this email directly or view it on GitHub https://github.com/NESCent/r-popgen-hackathon/issues/9#issuecomment-77788193 .
I just wrote a blog post outlining two approaches to identify multivariate outliers.
Thibaut - I am an intermediate R user and really have no idea about data classes. I'm all about compatibility though!
Katie - I like the material you put on the blog! Very happy to discuss stuff with you next week. If you have any toy dataset with some outliers, that will probably be super useful.
On Mon, Mar 9, 2015 at 1:06 PM, Katie Lotterhos notifications@github.com wrote:
I just wrote a blog post outlining two approaches to identify multivariate outliers https://sites.google.com/site/katielotterhos/opennotebooks/k-lo/multivariateoutliersingenomescans .
Thibaut - I am an intermediate R user and really have no idea about data classes. I'm all about compatibility though!
— Reply to this email directly or view it on GitHub https://github.com/NESCent/r-popgen-hackathon/issues/9#issuecomment-77849420 .
I can provide real data to test it (already on the wiki) or in collaboration. Also happy to dicuss more.
I have some code that can be used to identify outliers in multivariate space. I could see this as a way to combine results from multiple test statistics in genome scans (i.e. FST, genetic-environment association, genotype-phenotype association) to get multivariate outliers, rather than relying on outliers from individual statistics (note that I also think there are some caveats to this approach). I can see this feasibly developed into a package during the hackathon, and would love to try it on some simulated or real datasets. If anyone thinks this is a good idea, let me know.