bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
617 stars 335 forks source link

munge_sumstats.py throwing error on simulated dataset #150

Open standard-aaron opened 5 years ago

standard-aaron commented 5 years ago

I am calling munge_sumstats.py in the following way on a simulated dataset (made up chromosome, made up rsids):

./munge_sumstats.py \
--out test \
--N 2000.0 \
--sumstats example.sumstats.txt

and it's giving me this error:

Call:
./munge_sumstats.py \
--out test \
--chunksize 50000 \
--N 2000.0 \
--sumstats test.assoc.linear.ldsc

Interpreting column names as follows:
INFO:   INFO score (imputation quality; higher --> better imputation)
snpid:  Variant ID (e.g., rs number)
a1: Allele 1, interpreted as ref allele for signed sumstat.
P:  p-Value
beta:   [linear/logistic] regression coefficient (0 --> no effect; above 0 --> A1 is trait/risk increasing)
a2: Allele 2, interpreted as non-ref allele for signed sumstat.

Reading sumstats from test.assoc.linear.ldsc into memory 5000000 SNPs at a time.
. done

ERROR converting summary statistics:

Traceback (most recent call last):
  File "ldsc/munge_sumstats.py", line 687, in munge_sumstats
    dat = parse_dat(dat_gen, cname_translation, merge_alleles, log, args)
  File "ldsc/munge_sumstats.py", line 302, in parse_dat
    dat = pd.concat(dat_list, axis=0).reset_index(drop=True)
  File "/Users/ajstern/anaconda3/envs/ldsc/lib/python2.7/site-packages/pandas/core/reshape/concat.py", line 206, in concat
    copy=copy)
  File "/Users/ajstern/anaconda3/envs/ldsc/lib/python2.7/site-packages/pandas/core/reshape/concat.py", line 239, in __init__
    raise ValueError('No objects to concatenate')
ValueError: No objects to concatenate

Conversion finished at Thu Mar 14 11:41:22 2019
Total time elapsed: 0.04s
Traceback (most recent call last):
  File "ldsc/munge_sumstats.py", line 747, in <module>
    munge_sumstats(parser.parse_args(), p=True)
  File "ldsc/munge_sumstats.py", line 687, in munge_sumstats
    dat = parse_dat(dat_gen, cname_translation, merge_alleles, log, args)
  File "ldsc/munge_sumstats.py", line 302, in parse_dat
    dat = pd.concat(dat_list, axis=0).reset_index(drop=True)
  File "/Users/ajstern/anaconda3/envs/ldsc/lib/python2.7/site-packages/pandas/core/reshape/concat.py", line 206, in concat
    copy=copy)
  File "/Users/ajstern/anaconda3/envs/ldsc/lib/python2.7/site-packages/pandas/core/reshape/concat.py", line 239, in __init__
    raise ValueError('No objects to concatenate')
ValueError: No objects to concatenate

I've looked at other issues with the same error and I do not find any issues in the formatting of my files; eg

$ head example.sumstats.txt

snpid chr bp a1 a2 beta INFO P
rs1003140       1       2590    T       A       -0.5211 0.999   0.6023
rs1001846       1       5191    T       A       -1.895  0.999   0.05828
rs1005475       1       8509    T       A       -0.6763 0.999   0.4989
rs1002062       1       14709   T       A       -1.208  0.999   0.2271
rs1002645       1       16719   T       A       2.369   0.999   0.01792
rs1000589       1       18244   T       A       -2.646  0.999   0.008213
rs1002321       1       20087   T       A       -0.1907 0.999   0.8488
rs1001430       1       21134   T       A       -1.706  0.999   0.08821
rs1004451       1       23435   T       A       0.2344  0.999   0.8147

and I rigorously test this I do not get any missing fields in the rows.

Any idea what is causing the error? Does it have to do with using made-up RSIDs?

Thanks, aaron

alesssia commented 4 years ago

Can this be linked to #166? With made-up RSIDs not included in the file specified --merge_alleles (which I suppose it is some default file since you are not specifying it but it is reported in the log)?

Arslan-Zaidi commented 4 years ago

I had the same issue with my simulated dataset. I ended up carrying out weighted regression independently, which yielded results that made sense. Though would still like to know if this gets resolved at some point.