Closed lindenb closed 4 years ago
Hi, thanks for posting example. If you click on the link at bottom that shows: "General QC...". You'll see that there are 3 outliers in the right-hand sample plot.
If you hover over those 3 samples in the right-hand plot, you'll see that they are responsible for the cluster of points with relatedness between 0.2 and 0.4 in the left-hand plot. You can also hover over that cluster of points in the left-hand plot and see that those 3 samples are always part of the pair.
Most of the WGS samples that I see have proportion of sites with allele balance (AB) outside of 0.1-0.9 of less than 1%, where you have many samples >5% (y-axis in right-hand plot). So even if you discard/dis-regard the 3 problematic samples, you'll have relatedness values above 0.1.
Looks like your samples starting with "B" are generally lower quality than samples starting with "C".
Hope this helps somewhat. I'd like to figure a way to adjust relatedness for problematic samples like this, but that is not trivial without affecting other stuff.
that's helpful, many thanks Brent.
Hi Pierre, would you be willing/able to share the .somalier files for these samples? They would be a great set of files for testing how to improve the estimates with contaminated samples. I understand if you can't.
@brentp sorry I can't. :-)
ok. no worries.
As discussed on twitter : https://twitter.com/yokofakun/status/1286582806567620608
here is the html file:
https://nextcloud-bird.univ-nantes.fr/index.php/s/Hr9HmKiH5nMY6Rs
thank you for your help.