cggh / scikit-allel

A Python package for exploring and analysing genetic variation data
MIT License
283 stars 49 forks source link

Behaviour of count_alleles() when all genotypes are missing #379

Open alimanfoo opened 2 years ago

alimanfoo commented 2 years ago

If a genotype array has only missing genotypes, current behaviour of count_alleles() may not be ideal, reported by a colleague...

I'm just running one of my scripts and noticed an odd behaviour in the allele.counts() function in scikit allele (v 1.3.5). If there are some rows with all missing data, then the allele count is reported as 0. But if the entire genotype array is all missing data, then instead of 0 counts for each row, I get an error ("ValueError: zero-size array to reduction operation maximum which has no identity").

PS: error doesn't seem to be thrown until you try to print the array. You can still assign it to an object, or query it's shape (which returns 0 columns). I can see why returning 0 columns would make sense, since count_alleles can't know how many alleles there would be, but error when trying to print seems odd.

Perhaps the resulting allele counts array should always have at least one column, as we know there will always be a reference allele.