PlantandFoodResearch / MCHap

Polyploid micro-haplotype assembly using Markov chain Monte Carlo simulation.
MIT License
18 stars 3 forks source link

Avoid integer overflow when calculating unique number of haplotypes #157

Closed timothymillar closed 1 year ago

timothymillar commented 1 year ago

There are several areas in the code for mchap assemble where the total number of unique haplotypes is being used. This number is calculated as the product of the number of alleles at each SNP variant within the haplotype window.

In windows covering a large number of SNPs (e.g. > 62 biallelic variants) this can result in an integer overflow. This will silently alter the prior-probabilities of genotypes. However, this should have affected priors in a uniform manner and produced consistent results once normalized.

The issue can occasionally result in a ZeroDivisionError when the value happens to overflow to exactly 0.