N50 function cannot operate on genomes larger than 2.1Gbp

Bioconductor / Biostrings

Efficient manipulation of biological strings

https://bioconductor.org/packages/Biostrings

57 stars 16 forks source link

N50 function cannot operate on genomes larger than 2.1Gbp #28

Closed oneillkza closed 5 years ago

oneillkza commented 5 years ago

R's integers have long precision, meaning they max out around 2.14 x 10^9. Many genomes (e.g. human) and other data sets that are desirable to compute N50s for (e.g. long-read sequence data) have cumulative sizes far in excess of this. For this function to be able to operate on these larger data sets, it needs to leave the data in numeric form (ie double precision).

I've created a pull request (#27 ) that removes the cast to integer which the function performs.

hpages commented 5 years ago

Thx for the PR. I just applied it (commit 2c58f6b7d6a7cb07c7af4f88cba3e63101bef1df).

oneillkza commented 5 years ago

You're most welcome!