bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
652 stars 345 forks source link

Partitioned heritability #91

Open sbguarch opened 7 years ago

sbguarch commented 7 years ago

Hi ,

I was trying to perform the partitioned heritability calculations. So I took one of your examples, based on the BMI GIANT meta-analysis. (https://github.com/bulik/ldsc/wiki/Partitioned-Heritability).

The first step works perfectly, to get the sumstats format
python ../ldsc-master/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles ../test/w_hm3.snplist --out BMI --a1-inc

However, if I try to partition heritability I get this error (I also tried with my own data, and the error is always the same). Do you know exactly what I'm doing wrong? Thanks.

Reading summary statistics from BMI.sumstats.gz ... Read summary statistics for 1040803 SNPs. Reading reference panel LD Score from baseline/baseline.[1-22] ... Read reference panel LD Scores for 1189907 SNPs. Removing partitioned LD Scores with zero variance. Reading regression weight LD Score from weights_hm3_no_hla/weights.[1-22] ... Read regression weight LD Scores for 1242190 SNPs. After merging with reference panel LD, 948813 SNPs remain. After merging with regression SNP LD, 946647 SNPs remain. Removed 17 SNPs with chi^2 > 123.912 (946630 SNPs remain) Traceback (most recent call last): File "../ldsc-master/ldsc.py", line 645, in sumstats.estimate_h2(args, log) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/sumstats.py", line 357, in estimate_h2 hsqhat = reg.Hsq(chisq, ref_ld, s(sumstats[w_ld_cname]), s(sumstats.N),M_annot, n_blocks=n_blocks, intercept=args.intercept_h2,twostep=args.two_step, old_weights=old_weights) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/regressions.py", line 346, in init slow=slow, step1_ii=step1_ii, old_weights=old_weights) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/regressions.py", line 215, in init self.coef, self.coef_cov, self.coef_se = self._coef(jknife, Nbar) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/regressions.py", line 267, in _coef coef_cov = jknife.jknife_cov[0:n_annot, 0:n_annot] / Nbar 2 FloatingPointError: invalid value encountered in true_divide

Analysis finished at Thu Oct 5 12:35:27 2017 Total time elapsed: 40.66s Traceback (most recent call last): File "../ldsc-master/ldsc.py", line 645, in sumstats.estimate_h2(args, log) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/sumstats.py", line 357, in estimate_h2 hsqhat = reg.Hsq(chisq, ref_ld, s(sumstats[w_ld_cname]), s(sumstats.N),M_annot, n_blocks=n_blocks, intercept=args.intercept_h2,twostep=args.two_step, old_weights=old_weights) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/regressions.py", line 346, in init slow=slow, step1_ii=step1_ii, old_weights=old_weights) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/regressions.py", line 215, in init self.coef, self.coef_cov, self.coef_se = self._coef(jknife, Nbar) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/regressions.py", line 267, in _coef coef_cov = jknife.jknife_cov[0:n_annot, 0:n_annot] / Nbar 2 FloatingPointError: invalid value encountered in true_divide

rkwalters commented 7 years ago

Hi, Odd error, not something I’ve seen before. My first guess would be some sort of package versioning issue. Can you check the output of this command? pip show numpy scipy pandas bitarray | egrep "Name|Version"

Might also trying re-downloading the reference LD score files in case they got corrupted somehow, and/or testing whether you can successfully run analysis with a different set of LD scores (e.g. univariate “eur_w_ld_chr” or the “1000GPhase3*” partitioned set).

On Oct 5, 2017, at 6:39 AM, sbguarch notifications@github.com wrote:

Hi ,

I was trying to perform the partitioned heritability calculations. So I took one of your examples, based on the BMI GIANT meta-analysis. (https://github.com/bulik/ldsc/wiki/Partitioned-Heritability https://github.com/bulik/ldsc/wiki/Partitioned-Heritability).

The first step works perfectly, to get the sumstats format python ../ldsc-master/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles ../test/w_hm3.snplist --out BMI --a1-inc

However, if I try to partition heritability I get this error (I also tried with my own data, and the error is always the same). Do you know exactly what I'm doing wrong? Thanks.

Reading summary statistics from BMI.sumstats.gz ... Read summary statistics for 1040803 SNPs. Reading reference panel LD Score from baseline/baseline.[1-22] ... Read reference panel LD Scores for 1189907 SNPs. Removing partitioned LD Scores with zero variance. Reading regression weight LD Score from weights_hm3_no_hla/weights.[1-22] ... Read regression weight LD Scores for 1242190 SNPs. After merging with reference panel LD, 948813 SNPs remain. After merging with regression SNP LD, 946647 SNPs remain. Removed 17 SNPs with chi^2 > 123.912 (946630 SNPs remain) Traceback (most recent call last): File "../ldsc-master/ldsc.py", line 645, in sumstats.estimate_h2(args, log) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/sumstats.py", line 357, in estimate_h2 hsqhat = reg.Hsq(chisq, ref_ld, s(sumstats[w_ld_cname]), s(sumstats.N),M_annot, n_blocks=n_blocks, intercept=args.intercept_h2,twostep=args.two_step, old_weights=old_weights) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/regressions.py", line 346, in init slow=slow, step1_ii=step1_ii, old_weights=old_weights) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/regressions.py", line 215, in init self.coef, self.coef_cov, self.coef_se = self._coef(jknife, Nbar) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/regressions.py", line 267, in _coef coef_cov = jknife.jknife_cov[0:n_annot, 0:n_annot] / Nbar 2 FloatingPointError: invalid value encountered in true_divide

Analysis finished at Thu Oct 5 12:35:27 2017 Total time elapsed: 40.66s Traceback (most recent call last): File "../ldsc-master/ldsc.py", line 645, in sumstats.estimate_h2(args, log) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/sumstats.py", line 357, in estimate_h2 hsqhat = reg.Hsq(chisq, ref_ld, s(sumstats[w_ld_cname]), s(sumstats.N),M_annot, n_blocks=n_blocks, intercept=args.intercept_h2,twostep=args.two_step, old_weights=old_weights) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/regressions.py", line 346, in init slow=slow, step1_ii=step1_ii, old_weights=old_weights) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/regressions.py", line 215, in init self.coef, self.coef_cov, self.coef_se = self._coef(jknife, Nbar) File "/gpfs/scratch/bsc05/bsc05422/JFERRER_IRENE/LDREG/ldsc-master/ldscore/regressions.py", line 267, in _coef coef_cov = jknife.jknife_cov[0:n_annot, 0:n_annot] / Nbar 2 FloatingPointError: invalid value encountered in true_divide

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bulik/ldsc/issues/91, or mute the thread https://github.com/notifications/unsubscribe-auth/AILEvdvY14JkhFalm-hGpjWgl5Noouokks5spLHmgaJpZM4Pu4H2.

sbguarch commented 7 years ago

Hi,

This is the output of the command:

Name: numpy Version: 1.13.1 Name: scipy Version: 0.19.0 Name: pandas Version: 0.20.2 Name: bitarray Version: 0.8.1

So basically, for the old eur_w_ld_chr dataset it works (I think that you provide for the ref-ld-chr and for the w-ld-chr flags two sets that only have the LD information and nothing more). For the 1000G_Phase3 I got the same error.

Thanks!

eur_w_ld_chr

rkwalters commented 7 years ago

Cool, so for reference my current default environment is:

Name: numpy Version: 1.11.0 Name: scipy Version: 0.17.1 Name: pandas Version: 0.17.1 Name: bitarray Version: 0.8.1

and it sounds like whatever the issue is only affects the partitioned analysis.

@tpoterba : do you have time to look at this? I don't see anything obvious yet from a quick search of the final error message about "true_divide", so may be something upstream of that in ldsc.

Cheers, Raymond

tpoterba commented 7 years ago

Sounds like this is a problem around handling of inf / nan values. This isn't the first time that newer versions of packages have caused problems -- we may want to look into committing a conda environment that uses the versions that we know work.

I don't think I'll have time to chase down the root cause of this bug until later in the month, so the best thing to do for now is to install the older versions of pandas and numpy.

sbguarch commented 7 years ago

HI

I tried to work with another environment (with older versions of pandas and numpy) and it worked. So everything is running.

Thanks for your help!