bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
652 stars 344 forks source link

munge_sumstats.py never finished, then I had to use AWK #181

Open jielab opened 5 years ago

jielab commented 5 years ago

Hi, there:

Please see my screenshot attached below, this munge_sumstats.py command is still running after 10 hours. Then I had to kill it. Can you please let me know what I did wrong.

Since munge_sumstats.py does not work for me at this moment, I had to write a few simple Linux commands to do the data merging and formatting myself. My original GWAS summary statistics file has "BETA" and "SE", but not "Z". I assume that munge_sumstats.py will calculate a Z ( =BETA/SE ). But will ldsc.py still work if I use AWK to create an input file that only has BETA and SE, without Z?

I am not understanding why I need to write "--signed-sumstats BETA,0" for munge_sumstats.py. Shouldn't it be "--signed-sumstats BETA,SE" instead? Will munge_sumstats.py work if my GWAS file only has BETA but not SE? I assume it won't work.

BTW, I noticed that munge_sumstats.py will complain if my GWAS file has both "EAF" and "MAF", even when I specified "--freq EAF". I had to rename "MAF" to "MAF1" to make it work. Your clarification is greatly appreciated!

Best regards, Jie

11

giuseppe-fanelli commented 4 years ago

I have the same problem. munge_sumstats.py stops at the same point of the above screenshot: "Reading sumstats from *** into memory 5000000SNPs at a time". The log file doesn't give me any error, but the process ends after many hours due to time limit (also if I set 48 hs or more). It happens also when I follow step by step the tutorial. How could I solve it? I have the last ldsc version.

choishingwan commented 4 years ago

I have the same problem. Based on previous issues, using --chunksize 500000 seems to solve the problem though.

samkleeman1 commented 3 years ago

This fix from @choishingwan also worked for me. Perhaps the default settings can be updated?