CAST-genomics / haptools

Ancestry and haplotype aware simulation of genotypes and phenotypes for complex trait analysis
https://haptools.readthedocs.io
MIT License
18 stars 4 forks source link

parallelize reading from a PGEN file #253

Open aryarm opened 3 months ago

aryarm commented 3 months ago

we should be able to parallelize by dividing the file into chunks and reading each chunk in a separate thread or process

multi-threading will be harder to implement than multi-processing b/c of the python GIL and thus (might?) require a compiled extension to python but multi-processing will be slower and probably require copying of each chunk into the larger array

@d-laub and I have had some very productive chats about this and are working on a strategy 💪