cggh / scikit-allel

A Python package for exploring and analysing genetic variation data
MIT License
288 stars 50 forks source link

gametic heterozygosity_observed #279

Open timothymillar opened 5 years ago

timothymillar commented 5 years ago

See #277 for earlier discussion

This updates heterozygosity_observed to use "gametic heterozygosity" which assumes polysomic inheritance (i.e. autopolyploidy). Gametic heterozygosity is identical to the existing calculation (Nei's method) for the diploid case but generalises it to autopolyploids.

This implementation follows Hardy 2016 and Meirmans and Liu 2018.

An additional argument corrected is added which defaults to True to correct for the ploidy level. If this is set to False uncorrected Ho is calculated which is discussed in Meirmans and Liu 2018 for comparing across ploidy levels.

Note that the existing code is used as a special case for diploids because it is faster - not because it produces a different result.

I updated the triploid test case though I'm not entirely sure about the applicability to odd-numbered ploidy levels (Edit: this method should be fine for odd ploidy levels).

pep8speaks commented 5 years ago

Hello @timothymillar! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2020-08-04 07:23:26 UTC
timothymillar commented 5 years ago

@alimanfoo I'm having some second thoughts about this PR now.

This heterozygosity_observed still requires a GenotypeArray an hence a single ploidy level for all samples.

If #287 were to be implemented then heterozygosity_observed could be updated for mixed ploidy, but #287 is just a suggestion at this point.

Alternatively a new function could be implemented that takes a GenotypeAlleleCountsArray and assumes the ploidy level at each loci in each sample is equivalent to the sum of allele counts (i.e. it assumes that all genotypes are complete). This would allow for mixed ploidy levels but would require that the user removes any partial genotypes themselves.

timothymillar commented 4 years ago

@alimanfoo I have updated this with the following changes:

I think this is the correct approach for supporting mixed ploidy data as it makes it explicit which functions are supported and avoids complicating the base genotype model.