Closed fgvieira closed 4 years ago
I see what you mean. I am looking into this.
@fgvieira you are absolutely right. would you check this binary and verify that it works for you to fix this case?
@brentp just checked the new version and it fixes the issue I was seeing. for this pair of samples relatedness
went from 0.101
to 0.979
:+1:
Hi,
I am running some different samples (RNA vs cfDNA) through
somalier
and one of them came out with a very low relatedness, even though 77 (out of 78) sites are IBS2:I think that, even though there are a lot of heterozygote sites, they do not actually overlap (the reason why n=78). On the RNA data probably because of non-expressed genes, and on the cfDNA prob due to the low coverage.
According to somalier's paper, relatedness is calculated as:
However, in this case, even though one sample has 228 hets and the other 556, they only overlap on 78 snps. So, shouldn't the formula be a bit more like:
where
hets_in_common_pos(i)
stands for the number of hets in sample "i" among the positions shared ("n"). This change should have no effect when comparing the same type of seq (since the overlap should be quite high) and improve comparisons of different types of seq.