Open marcustutert opened 2 years ago
Hi, I did this for peddy (a predecessor to somalier). You can see that increasing the number of sites quickly plateau's: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5339084/figure/fig1/
the benefit of having more sites is that if you have cohorts for targeted regions or spotty coverage, then you would still potentially have enough sites.
Note that the site selection in somalier is better than for peddy, so those might plateau even sooner.
Thanks Brent. I think I'll stick with my 1000 SNPs that intersect between the cohorts then. The other option I had was to do something complicated and do pairwise intersections between my cohorts to maximize SNPs (I've done this with KING and it worked great, but for whatever reason(?), KING likes there to be lots of lots of SNPs to estimate the relatedness as opossed to somalier) but it seems that I won't have to do this with somalier. Should save me some work!
Cheers.
Hi,
Just a general question but have you ever looked at the performance of somalier to detect relatedness as a function of the Nsnps in the samples? I noticed on the documentation you provided a general description of the algorithim and suggested that with only a few 10s of SNPs that relatedness metrics were well calibrated. Would there be any benefit at all (taking into account the increased runtime I assume?) in running somalier with as many shared SNPs as possible between two cohorts?
Thanks.