FIXED - Fail to converst summary statistic in .sumstats format: munge_sumstats is taking hours

alesss78 commented 5 years ago

I am trying to reproduce the example provided in: https://github.com/bulik/ldsc/wiki/Heritability-and-Genetic-Correlation

In particular, I downloaded both the summary statistics file: wget www.med.unc.edu/pgc/files/resultfiles/pgc.cross.bip.zip and the list of SNPs: wget https://data.broadinstitute.org/alkesgroup/LDSCORE/w_hm3.snplist.bz

I unzipped both files and then I used munge_sumstats.py to start the file conversion as following: python //munge_sumstats.py --sumstats pgc.cross.BIP11.2013-05.txt --N 17115 --out scz --merge-alleles w_hm3.snplist

I obtain the following output that seems correct:

Call: ./munge_sumstats.py \ --out scz \ --merge-alleles w_hm3.snplist \ --N 17115.0 \ --sumstats pgc.cross.BIP11.2013-05.txt Interpreting column names as follows: info: INFO score (imputation quality; higher --> better imputation) snpid: Variant ID (e.g., rs number) a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. or: Odds ratio (1 --> no effect; above 1 --> A1 is risk increasing) Reading list of SNPs for allele merge from w_hm3.snplist Read 1217311 SNPs for allele merge. Reading sumstats from pgc.cross.BIP11.2013-05.txt into memory 5000000 SNPs at a time.

The program is then stuck after this. It uses 100% of one processor and only few gigas of ram. In the tutorial it is said this conversion should take about 20 seconds. On the contrary, I waited for about 1 hour but the conversion didn't finished.

Any hints on why the process is so slow? Any help would be appreciated Thank you

alesss78 commented 5 years ago

EDIT: I manged to make munge_sumstats.py complete Summary statistic conversion by reducing chunk size: by default chunksize = 5000000. I reduced it to 500000 by adding the option: --chunksize 500000. It worked as intended.

ttuowang commented 5 years ago

I encountered the same problem. Thanks for your solution, it saved me a lot of time.

YaoXueming commented 5 years ago

wow, it's really nice, thank you so much!

giuseppe-fanelli commented 4 years ago

thanks a lot

privefl commented 4 years ago

I got from 2 days to 1 minute with this option?!

xsun1229 commented 4 years ago

EDIT: I manged to make munge_sumstats.py complete Summary statistic conversion by reducing chunk size: by default chunksize = 5000000. I reduced it to 500000 by adding the option: --chunksize 500000. It worked as intended.

Great thanks