bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
652 stars 344 forks source link

Heritability have error #140

Open ameet20 opened 5 years ago

ameet20 commented 5 years ago

It work well when single trait is analyzed separately. However it fails when two traits are analyzed together using --rg. Could you help me figure out it? Thank you very much!

The details are as follows, Call: ./ldsc.py --out /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Simul_1/zzz.rg_12 --rg /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Simul_1/Simul1.y1.sumstats.gz,/bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Simul_1/Simul1.y2.sumstats.gz --w-ld /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Eur_ld_chr/Chr1 --ref-ld /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Eur_ld_chr/Chr1

Beginning analysis at Fri Dec 28 08:56:41 2018 Reading summary statistics from /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Simul_1/Simul1.y1.sumstats.gz ... Read summary statistics for 400984 SNPs. Reading reference panel LD Score from /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Eur_ld_chr/Chr1 ... Read reference panel LD Scores for 400984 SNPs. Removing partitioned LD Scores with zero variance. Reading regression weight LD Score from /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Eur_ld_chr/Chr1 ... Read regression weight LD Scores for 400984 SNPs. After merging with reference panel LD, 400984 SNPs remain. After merging with regression SNP LD, 400984 SNPs remain. Computing rg for phenotype 2/2 Reading summary statistics from /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Simul_1/Simul1.y2.sumstats.gz ... Read summary statistics for 400984 SNPs. After merging with summary statistics, 400984 SNPs remain. 340046 SNPs with valid alleles. ERROR computing rg for phenotype 2/2, from file /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Simul_1/Simul1.y2.sumstats.gz. Traceback (most recent call last): File "/rhome/jwei/ldsc/ldscore/sumstats.py", line 409, in estimate_rg loop = _read_other_sumstats(args, log, p2, sumstats, ref_ld_cnames) File "/rhome/jwei/ldsc/ldscore/sumstats.py", line 441, in _read_other_sumstats loop['Z2'] = _align_alleles(loop.Z2, alleles) File "/rhome/jwei/ldsc/ldscore/sumstats.py", line 517, in _align_alleles raise KeyError(msg) KeyError: 'Incompatible alleles in .sumstats files: GCGC. Did you forget to use --merge-alleles with munge_sumstats.py?

JuRaGa commented 5 years ago

Hi, I am getting the exact same error than in this thread:

KeyError: 'Incompatible alleles in .sumstats files: GAGT. Did you forget to use --merge-alleles with munge_sumstats.py?'

I did not include the "--merge-alleles" in the "munge_sumstats.py" block because I was losing almost all my SNVs. Instead, I added the column "INFO" so now all SNVs with INFO<= 0.9 are filtered out. This worked well (no warnings or errors), and I got the output files. I don't understand why it is crashing for the genetic correlation computation?

Thank you,

melothemightyone commented 5 years ago

You can write your own munge_sumstats script to seive out all the "normal" SNPs (like A-T,or G-C). At least it works for me.

Xuemin-Wang commented 4 years ago

I used --merge-alleles when munging data, but still got the same error when estimating rg. Can you @melothemightyone share your code you rewrote to exclude incompatible alleles? Many thanks,

ZiqianXie commented 2 years ago

The problem is at ldscore/sumstats.py, line 440: while the loop is filtered so that only valid alleles combinations remain, the alleles itself is not filtered, hence the KeyError. You can change the sumstats.py between line 438 and 439 to

li = _filter_alleles(alleles)
loop = _select_and_log(loop, li, log,
                       '{N} SNPs with valid alleles.')
alleles = alleles[li]

This removes the incompatible alleles combinations from the alleles table. I guess this repo is not actively maintained so I am just posting my workaround here.

DC-Jade commented 1 year ago

The problem is at ldscore/sumstats.py, line 440: while the loop is filtered so that only valid alleles combinations remain, the alleles itself is not filtered, hence the KeyError. You can change the sumstats.py between line 438 and 439 to

li = _filter_alleles(alleles)
loop = _select_and_log(loop, li, log,
                       '{N} SNPs with valid alleles.')
alleles = alleles[li]

This removes the incompatible alleles combinations from the alleles table. I guess this repo is not actively maintained so I am just posting my workaround here.

sorry, because ldscore/sumstats.py version maybe not same, I couldn't make sure where to insert the code. Would you like to show the codes aroud where you inserted? thanks

ZiqianXie commented 1 year ago

The problem is at ldscore/sumstats.py, line 440: while the loop is filtered so that only valid alleles combinations remain, the alleles itself is not filtered, hence the KeyError. You can change the sumstats.py between line 438 and 439 to

li = _filter_alleles(alleles)
loop = _select_and_log(loop, li, log,
                       '{N} SNPs with valid alleles.')
alleles = alleles[li]

This removes the incompatible alleles combinations from the alleles table. I guess this repo is not actively maintained so I am just posting my workaround here.

sorry, because ldscore/sumstats.py version maybe not same, I couldn't make sure where to insert the code. Would you like to show the codes aroud where you inserted? thanks

432 def _read_other_sumstats(args, log, p2, sumstats, ref_ld_cnames):
433     loop = _read_sumstats(args, log, p2, alleles=True, dropna=False)
434     loop = _merge_sumstats_sumstats(args, sumstats, loop, log)
435     loop = loop.dropna(how='any')
436     alleles = loop.A1 + loop.A2 + loop.A1x + loop.A2x
437     if not args.no_check_alleles:
438         li = _filter_alleles(alleles)
439         loop = _select_and_log(loop, li, log,
440                                '{N} SNPs with valid alleles.')
441         alleles = alleles[li]
442         loop['Z2'] = _align_alleles(loop.Z2, alleles)
443
444     loop = loop.drop(['A1', 'A1x', 'A2', 'A2x'], axis=1)
445     _check_ld_condnum(args, log, loop[ref_ld_cnames])
446     _warn_length(log, loop)
447     return loop