Heterozygosity_blue is a variation of heterozygosity_expected which minimizes bias due to related individuals based on a kniship matrix.
This implementation makes the kinship matrix an optional argument.
If kinship is not specified then "regular" heterozygosity_expected is calculated using the (n / (n-1)) correction
described in #145 (resolving that issue).
This is a sensible 'default' because heterozygosity_blue is equivalent to heterozygosity_expected when the kinship matrix indicates non-related, non-inbred individuals (a matrix of zeros with a diagonal of 1/ploidy).
The implementation is a little more complex than ideal because the inverse of the kinship matrix must be calculated for only the called samples at each variant.
I have tried to do this in a way that minimizes recalculation of the inverse matrix.
Additionally a nan value is returned if the (sub-)kinship matrix has no inverse.
It's worth noting that heterozygosity_blue can return values outside of the interval [0, 1] with some inputs.
Related to #287 there is an optional argument to specify mixed ploidy data
Best linear unbiased estimate of expected heterozygosity following eq. 8 in Harris and DeGiorgio (2017).
Heterozygosity_blue is a variation of heterozygosity_expected which minimizes bias due to related individuals based on a kniship matrix. This implementation makes the kinship matrix an optional argument.
If kinship is not specified then "regular" heterozygosity_expected is calculated using the (n / (n-1)) correction described in #145 (resolving that issue). This is a sensible 'default' because heterozygosity_blue is equivalent to heterozygosity_expected when the kinship matrix indicates non-related, non-inbred individuals (a matrix of zeros with a diagonal of 1/ploidy).
The implementation is a little more complex than ideal because the inverse of the kinship matrix must be calculated for only the called samples at each variant. I have tried to do this in a way that minimizes recalculation of the inverse matrix. Additionally a nan value is returned if the (sub-)kinship matrix has no inverse.
It's worth noting that heterozygosity_blue can return values outside of the interval [0, 1] with some inputs.
Related to #287 there is an optional argument to specify mixed ploidy data