Al-Murphy / MungeSumstats

Rapid standardisation and quality control of GWAS or QTL summary statistics
https://doi.org/doi:10.18129/B9.bioc.MungeSumstats
75 stars 16 forks source link

Munging full summary stats is slow #112

Closed PhoebeGuo97 closed 2 years ago

PhoebeGuo97 commented 2 years ago

I'm try to munge my full summary stats by fullSS_path <- MungeSumstats::format_sumstats(path = fullSS_path, ref_genome = "GRCH37"). I left my MacBook on overnight and it's still running. I'm wondering if there is a way to speed up this process.

The following is what I see in R Console: First line of summary statistics file: uniqID.a1a2 CHR BP A1 A2 SNP Z P Nsum Neff dir EAF BETA SE Summary statistics report:

13,367,299 rows 13,354,030 unique variants 2,394 genome-wide significant variants (P<5e-8) 22 chromosomes Checking for multi-GWAS. Checking for multiple RSIDs on one row. Checking SNP RSIDs. 11,599 SNP IDs are not correctly formatted. These will be corrected from the reference genome. 165,861 SNP IDs appear to be made up of chr:bp, these will be replaced by their SNP ID from the reference genome Checking for merged allele column. Checking A1 is uppercase Checking A2 is uppercase Ensuring all SNPs are on the reference genome. Loading reference genome data.

Al-Murphy commented 2 years ago

Hi,

Although we have noted an increase in runtime now that we have updated dbSNP to version 155 (which has 9 billion SNPs vs 1 billion), the runtime is no where near as long as you have noted. For example, munging large open GWAS sumstats (larger than the sumstats you mention above) is taking 14-25 minutes per sumstat. However this is on a 258Gb RAM machine but you can understand the runtime range.

The issue code be insufficient RAM on your machine so it just chokes up the run, what is it for your Macbook? Perhaps test monitoring the RAM usage during the run with task manager? Also make sure no other processes are running in the background as this will slow things.

Alan.

PhoebeGuo97 commented 2 years ago

Thanks for the information! The memory of my MacBook is 8Gb. Perhaps that's the major reason.

Al-Murphy commented 2 years ago

Yeah I think that's probably the issue!