Open alesss78 opened 5 years ago
EDIT: I manged to make munge_sumstats.py complete Summary statistic conversion by reducing chunk size: by default chunksize = 5000000. I reduced it to 500000 by adding the option: --chunksize 500000. It worked as intended.
I encountered the same problem. Thanks for your solution, it saved me a lot of time.
wow, it's really nice, thank you so much!
thanks a lot
I got from 2 days to 1 minute with this option?!
EDIT: I manged to make munge_sumstats.py complete Summary statistic conversion by reducing chunk size: by default chunksize = 5000000. I reduced it to 500000 by adding the option: --chunksize 500000. It worked as intended.
Great thanks
thanks!
+1
Incredible tip, hours and hours -> 1m 16s Thanks!
thanks so much for the useful hint! it works fine now :)
Really solved the problem, thank you so much!
Thanks from 2024!
I am trying to reproduce the example provided in: https://github.com/bulik/ldsc/wiki/Heritability-and-Genetic-Correlation
In particular, I downloaded both the summary statistics file: wget www.med.unc.edu/pgc/files/resultfiles/pgc.cross.bip.zip and the list of SNPs: wget https://data.broadinstitute.org/alkesgroup/LDSCORE/w_hm3.snplist.bz
I unzipped both files and then I used munge_sumstats.py to start the file conversion as following: python //munge_sumstats.py
--sumstats pgc.cross.BIP11.2013-05.txt
--N 17115
--out scz
--merge-alleles w_hm3.snplist
I obtain the following output that seems correct:
Call: ./munge_sumstats.py \ --out scz \ --merge-alleles w_hm3.snplist \ --N 17115.0 \ --sumstats pgc.cross.BIP11.2013-05.txt Interpreting column names as follows: info: INFO score (imputation quality; higher --> better imputation) snpid: Variant ID (e.g., rs number) a1: Allele 1, interpreted as ref allele for signed sumstat. pval: p-Value a2: Allele 2, interpreted as non-ref allele for signed sumstat. or: Odds ratio (1 --> no effect; above 1 --> A1 is risk increasing) Reading list of SNPs for allele merge from w_hm3.snplist Read 1217311 SNPs for allele merge. Reading sumstats from pgc.cross.BIP11.2013-05.txt into memory 5000000 SNPs at a time.
The program is then stuck after this. It uses 100% of one processor and only few gigas of ram. In the tutorial it is said this conversion should take about 20 seconds. On the contrary, I waited for about 1 hour but the conversion didn't finished.
Any hints on why the process is so slow? Any help would be appreciated Thank you