brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
MIT License
254 stars 35 forks source link

Missing relatedness pair #54

Closed eyu8 closed 4 years ago

eyu8 commented 4 years ago

Hello, I generated somalier.pairs.tsv, somalier.samples.tsv and somalier.groups.tsv. However, some samples pairs are not present in somalier.pairs.tsv I don't know if it is because somalier wasn't able to calculated the relatedness or is it for other reasons?

somalier.log.txt somalier.pairs.txt somalier.samples.txt somalier.groups.txt somalier.html.txt

For example, S13891 vs S20316; S12654 vs S14843; S12654 vs S14845; S14844 vs S14845 Thanks

brentp commented 4 years ago

This is not unexpected. Your log file indicates the reason:

[somalier] html and text output will have unrelated sample-pairs subset to 7.15% of points

If you have a large number of samples, the number of possible pairwise combinations is huge so somalier will only report:

  1. samples expected to be related by your pedigree or groups files (if given)
  2. samples with a relatedness > 0.2
  3. a random subset of unrelated samples.
eyu8 commented 4 years ago

Thank you