Closed bschilder closed 3 years ago
Both tabular and VCF format sum stats can now be read in using data.table::fread()
which speeds up most processes even when single-threaded.
Also, removed all instances where full data file was read in multiple times to extract header info. Replaced with convenience function read_header()
which only reads in the first 2 lines.
import_sumstats
can now run in parallel across multiple Open GWAS IDs when parallel_across_ids=TRUE
. Otherwise, multiple cores can be allocated to processing each dataset.
I think this is in a pretty good place now that we've done a lot of optimization. Always room for improvement, but I think it's justified to close this for now.
Not sure if this is already implemented in some places, but MungeSumstats could really benefit from optimizing the speed of processes. Not always an easy task, but here's some ideas.
1. Comb through the code and search for inefficiencies
2. Parallelize across CPUs
import_sumstats
).3. Parallelize across GPU(s)
A la cudf, which is part of the RAPIDS suite.