brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
MIT License
254 stars 35 forks source link

Definition of relatedness as implemented in somalier #74

Open asp8200 opened 3 years ago

asp8200 commented 3 years ago

This might be more of a request for clarification than an actual issue or problem with the code.

I noticed that the definition of relatedness seem to have changed with the release of version 0.2.12 of somalier.

The change was, as far as I can tell, prompted by the following issue reported by Filipe Garrett Vieira (fgvieira): https://github.com/brentp/somalier/issues/55

In the Somalier-paper [ https://doi.org/10.1186/s13073-020-00761-2 ], relatedness was defined as

(shared-hets(i,j) - 2 * ibs0(i, j)) / min(hets(i), hets(j))

In #55, Filipe suggested that the definition of relatedness to be :

(shared-hets(i,j) - 2 * ibs0(i, j)) / min(hets_in_common_pos(i), hets_in_common_pos(j))

where hets_in_common_pos(i) stands for the number of hets in sample "i" among the positions shared ("n")

Judging from your reply to Filipe, I was expecting that you had implemented his suggestion, but when I looked at your commit https://github.com/brentp/somalier/commit/3e0b8401ac71e79de9afc13a2b1fd5df8d1dabdb

I got the impression that you didn’t implement Filipe’s suggestion but instead implemented relatedness as

r_{brent} = 2(shared-hets(i,j) - 2 * ibs0(i, j)) / hets_{ij}

where hets_{ij} is hets_in_common_pos(i) + hets_in_common_pos(j). As pointed out to me by Filipe, your definition could also be stated as

r_{brent} = (shared-hets(i,j) - 2 * ibs0(i, j)) / mean(hets_in_common_pos(i), hets_in_common_pos(j))

r_{brent} is, of course, always less than or equal to the relatedness suggested by Filipe.

It seems to me that your implementation corresponds to the following definition from Manichaikul [PMID: 20926424] equation (9) - except perhaps for a factor 2? The definition from Manichaikul et al.:

image

Can you confirm that there is this factor 2 difference between your definition and the one given my Manichaikul et al.?

In any case, if the definition of relatedness in the Somalier-paper doesn't correspond to the one used in the current version of somalier, perhaps this should be stated in the README-file?

brentp commented 3 years ago

Hi Anders, yes, I have attempted to implement the formula from the KING paper that you show. It is estimating kinship which is half of relatedness, hence the factor of 2. I'm not sure about the misunderstanding with @fgvieira, perhaps he can clarify. Does this address your concern?

I'm certainly interested if there are real-world consequences of any assumptions. Thanks for the careful review.

You're welcome to open a PR to make a note of the change in the README.