Open timothymillar opened 5 years ago
Hello @timothymillar! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
@alimanfoo I'm having some second thoughts about this PR now.
This heterozygosity_observed
still requires a GenotypeArray
an hence a single ploidy level for all samples.
If #287 were to be implemented then heterozygosity_observed
could be updated for mixed ploidy, but #287 is just a suggestion at this point.
Alternatively a new function could be implemented that takes a GenotypeAlleleCountsArray
and assumes the ploidy level at each loci in each sample is equivalent to the sum of allele counts (i.e. it assumes that all genotypes are complete). This would allow for mixed ploidy levels but would require that the user removes any partial genotypes themselves.
@alimanfoo I have updated this with the following changes:
heterozygosity_individual
function that calculates heterozygosity per individual (0 or 1 for diploids)ploidy
argument to heterozygosity_individual
and heterozygosity_observed
to allow for mixed ploidy genotypes (along either/both axes)I think this is the correct approach for supporting mixed ploidy data as it makes it explicit which functions are supported and avoids complicating the base genotype model.
See #277 for earlier discussion
This updates
heterozygosity_observed
to use "gametic heterozygosity" which assumes polysomic inheritance (i.e. autopolyploidy). Gametic heterozygosity is identical to the existing calculation (Nei's method) for the diploid case but generalises it to autopolyploids.This implementation follows Hardy 2016 and Meirmans and Liu 2018.
An additional argument
corrected
is added which defaults toTrue
to correct for the ploidy level. If this is set toFalse
uncorrected Ho is calculated which is discussed in Meirmans and Liu 2018 for comparing across ploidy levels.Note that the existing code is used as a special case for diploids because it is faster - not because it produces a different result.
I updated the triploid test case
though I'm not entirely sure about the applicability to odd-numbered ploidy levels(Edit: this method should be fine for odd ploidy levels).