Get pairs of individuals with a kinship larger or smaller than a certain cut-off

jorainer commented 3 years ago

For a familial resemblance analysis as defined in chapter 6 of this book we would need pairs of individuals with a kinship higher and lower than a certain threshold.

I would thus suggest a function kinshipPairs defined as follows:

kinshipPairs <- function(x, condition = function(x) x > 0.25)

The function should return a matrix with two columns, each row containing the IDs of a pair of individuals from the pedigree.
The condition would allow to define how the individual pairs should be identified (i.e. kinship larger than a threshold).
The function has to ensure that we're not returning duplicated ids (i.e. each ID has to be returned only once).

This would allow to calculate correlation coefficients between the pairs and then to evaluate whether these correlations are higher between relatives compared to unrelated individuals.

jorainer commented 3 years ago

happy for feedback on that @the-x-at

the-x-at commented 3 years ago

Sounds pretty straight-forward. One has to check the upper diagonal of the kinship matrix for condition, which avoids reporting duplicates. Maybe add an optional logical argument diag = FALSE to exclude/include the diagonal, which trivially is the kinship of the individual with itself. I think R is also pretty good in float comparisons, such that even x >= 0.25 should work, no?

Would it be desirable to provide a subset of the kinship matrix by offering id = NULL as an optional argument for subsetting by IDs, since this is supported many times in the interface? The same for one or even more families, i.e. family = NULL?

So the function would be a member of FAData and look like this:

    kinshipPairs <- function(condition, id = NULL, family = NULL, diag = FALSE)

I would define condition as a mandatory argument, underlining its importance. I am not sure what is the best way to define the signature of this function, as it operates on a kinship values, i.e. a numeric data type and report a data frame.

jorainer commented 3 years ago

upper.tri is actually a very good idea! That way we also avoid the diagonal and hence pairs of individuals with itself.

jorainer commented 3 years ago

For the id, do you think that is really necessary?

the-x-at commented 3 years ago

If we provide family, we also should provide id. Have a look at the other kinship-based functions. They have exactly this type of interface.

jorainer commented 3 years ago

So, the family and id parameters would allow to restrict the calculation of the pairs on certain individuals, right?

the-x-at commented 3 years ago

Esatto. I see them in mutually exclusive use. No reason to nail down a family and then restrict it by some IDs.

EuracBiomedicalResearch / FamAgg

Get pairs of individuals with a kinship larger or smaller than a certain cut-off #24