brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
MIT License
262 stars 35 forks source link

High relatedness for many unrelated individuals #58

Closed lindenb closed 4 years ago

lindenb commented 4 years ago

As discussed on twitter : https://twitter.com/yokofakun/status/1286582806567620608

I tested somalier 0.2.11 with ~230 wgs bams (families, unrelated individuals, etc... ) . Expected relationships are OK (parent-children, duplicate...) but I found many unrelated samples with a relatedness > 0.1 . Is it an expected behavior ?

here is the html file:

https://nextcloud-bird.univ-nantes.fr/index.php/s/Hr9HmKiH5nMY6Rs

thank you for your help.

brentp commented 4 years ago

Hi, thanks for posting example. If you click on the link at bottom that shows: "General QC...". You'll see that there are 3 outliers in the right-hand sample plot.

If you hover over those 3 samples in the right-hand plot, you'll see that they are responsible for the cluster of points with relatedness between 0.2 and 0.4 in the left-hand plot. You can also hover over that cluster of points in the left-hand plot and see that those 3 samples are always part of the pair.

Most of the WGS samples that I see have proportion of sites with allele balance (AB) outside of 0.1-0.9 of less than 1%, where you have many samples >5% (y-axis in right-hand plot). So even if you discard/dis-regard the 3 problematic samples, you'll have relatedness values above 0.1.

Looks like your samples starting with "B" are generally lower quality than samples starting with "C".

Hope this helps somewhat. I'd like to figure a way to adjust relatedness for problematic samples like this, but that is not trivial without affecting other stuff.

lindenb commented 4 years ago

that's helpful, many thanks Brent.

brentp commented 4 years ago

Hi Pierre, would you be willing/able to share the .somalier files for these samples? They would be a great set of files for testing how to improve the estimates with contaminated samples. I understand if you can't.

lindenb commented 4 years ago

@brentp sorry I can't. :-)

brentp commented 4 years ago

ok. no worries.