bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
644 stars 343 forks source link

./munge_sumstats.py is running for a long time #112

Open BioToolsLeeds opened 6 years ago

BioToolsLeeds commented 6 years ago

Hi there,

I could not proceed with ./ldsc.py as the ./munge_sumstats.py could not be completed- it is running for a long time.


Interpreting column names as follows: Effect_allele: Allele 1, interpreted as ref allele for signed sumstat. MarkerName: Variant ID (e.g., rs number) Beta: [linear/logistic] regression coefficient (0 --> no effect; above 0 --> A1 is trait/risk increasing) Pvalue: p-Value Non_Effect_allele: Allele 2, interpreted as non-ref allele for signed sumstat.

Reading list of SNPs for allele merge from w_hm3.snplist Read 1217311 SNPs for allele merge. Reading sumstats from gwas_igap_stage1.txt into memory 5000000 SNPs at a time.

srodri25 commented 6 years ago

This seems to be a version issue. A fix is to use the same versions used when the programs were published last year. On a MacOS with anaconda installed, I did this with the following commands in terminal (note I used pip install instead of condo install because conda install gave me errors with these older libraries, also it works with python 2.7.14 bit I chose to replicate the conda environment all of the latest versions at the time of publication to be conservative): conda create -n ldsc13 python=2.7.13 y

source activate ldsc13

pip install argparse==1.3.0 pip install bitarray==0.8.1 pip install nose==1.3.4 pip install numpy==1.8.0 pip install pandas==0.17.0 pip install scipy==0.11.0

Please let me know if this works for you.

yingji15 commented 6 years ago

I used conda environment and got help from post above, so I want to also add my environment specification here

My "ldsc_env.yml" file:

name: ldscenv dependencies:

I didn't have "argparse" in the list since it comes with python2.7 (thanks to tip from the google group!)

To create environment: conda env create -f ldsc_env.yml To activate that: source activate ldscenv

Amanda2018genetics commented 5 years ago

Hi there,

I could not proceed with ./ldsc.py as the ./munge_sumstats.py could not be completed- it is running for a long time.

  • LD Score Regression (LDSC)
  • Version 1.0.0
  • (C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane
  • Broad Institute of MIT and Harvard / MIT Department of Mathematics
  • GNU General Public License v3

Call:  ./munge_sumstats.py --out igap --merge-alleles w_hm3.snplist --N 54162.0 --sumstats gwas_igap.txt 

Interpreting column names as follows: Effect_allele: Allele 1, interpreted as ref allele for signed sumstat. MarkerName: Variant ID (e.g., rs number) Beta: [linear/logistic] regression coefficient (0 --> no effect; above 0 --> A1 is trait/risk increasing) Pvalue: p-Value Non_Effect_allele: Allele 2, interpreted as non-ref allele for signed sumstat.

Reading list of SNPs for allele merge from w_hm3.snplist Read 1217311 SNPs for allele merge. Reading sumstats from gwas_igap_stage1.txt into memory 5000000 SNPs at a time.

Hi BioToolsLeeds, I have encountered exactly the same problem with you. 1556034631(1) 1556063616(1)

My python version is 2.7, and the requirements version is as follows: [yaoyao@chenlinlab-4103 ldsc]$ pip freeze DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. bitarray==0.8.0 numpy==1.16.2 pandas==0.20.0 python-dateutil==2.8.0 pytz==2019.1 scipy==0.18.0 six==1.12.0

I am totally confused by this situation. Could you give me some hint for solving this ? Great thanks.

Truly, Yao Yao