Currently, the sparse nature of the large SNP files allows us to load the entire matrix in memory on a cluster. UK Biobank requires about 150GB for unphased calldata and with ancestry, phased calldata, about 250GB (using the sparse format given by io.snp_unphased and io.snp_phased_ancestry). In practice, there is no dataset that requires more memory than this, so it is very low priority to investigate the use of mmap. But this is the most general thing we can do if we simply run out of memory even with clever representation tricks.
Currently, the sparse nature of the large SNP files allows us to load the entire matrix in memory on a cluster. UK Biobank requires about 150GB for unphased calldata and with ancestry, phased calldata, about 250GB (using the sparse format given by
io.snp_unphased
andio.snp_phased_ancestry
). In practice, there is no dataset that requires more memory than this, so it is very low priority to investigate the use ofmmap
. But this is the most general thing we can do if we simply run out of memory even with clever representation tricks.