Open ameet20 opened 5 years ago
Hi, I am getting the exact same error than in this thread:
KeyError: 'Incompatible alleles in .sumstats files: GAGT. Did you forget to use --merge-alleles with munge_sumstats.py?'
I did not include the "--merge-alleles" in the "munge_sumstats.py" block because I was losing almost all my SNVs. Instead, I added the column "INFO" so now all SNVs with INFO<= 0.9 are filtered out. This worked well (no warnings or errors), and I got the output files. I don't understand why it is crashing for the genetic correlation computation?
Thank you,
You can write your own munge_sumstats script to seive out all the "normal" SNPs (like A-T,or G-C). At least it works for me.
I used --merge-alleles when munging data, but still got the same error when estimating rg. Can you @melothemightyone share your code you rewrote to exclude incompatible alleles? Many thanks,
The problem is at ldscore/sumstats.py
, line 440: while the loop
is filtered so that only valid alleles combinations remain, the alleles
itself is not filtered, hence the KeyError
. You can change the sumstats.py
between line 438 and 439 to
li = _filter_alleles(alleles)
loop = _select_and_log(loop, li, log,
'{N} SNPs with valid alleles.')
alleles = alleles[li]
This removes the incompatible alleles combinations from the alleles
table.
I guess this repo is not actively maintained so I am just posting my workaround here.
The problem is at
ldscore/sumstats.py
, line 440: while theloop
is filtered so that only valid alleles combinations remain, thealleles
itself is not filtered, hence theKeyError
. You can change thesumstats.py
between line 438 and 439 toli = _filter_alleles(alleles) loop = _select_and_log(loop, li, log, '{N} SNPs with valid alleles.') alleles = alleles[li]
This removes the incompatible alleles combinations from the
alleles
table. I guess this repo is not actively maintained so I am just posting my workaround here.
sorry, because ldscore/sumstats.py version maybe not same, I couldn't make sure where to insert the code. Would you like to show the codes aroud where you inserted? thanks
The problem is at
ldscore/sumstats.py
, line 440: while theloop
is filtered so that only valid alleles combinations remain, thealleles
itself is not filtered, hence theKeyError
. You can change thesumstats.py
between line 438 and 439 toli = _filter_alleles(alleles) loop = _select_and_log(loop, li, log, '{N} SNPs with valid alleles.') alleles = alleles[li]
This removes the incompatible alleles combinations from the
alleles
table. I guess this repo is not actively maintained so I am just posting my workaround here.sorry, because ldscore/sumstats.py version maybe not same, I couldn't make sure where to insert the code. Would you like to show the codes aroud where you inserted? thanks
432 def _read_other_sumstats(args, log, p2, sumstats, ref_ld_cnames): 433 loop = _read_sumstats(args, log, p2, alleles=True, dropna=False) 434 loop = _merge_sumstats_sumstats(args, sumstats, loop, log) 435 loop = loop.dropna(how='any') 436 alleles = loop.A1 + loop.A2 + loop.A1x + loop.A2x 437 if not args.no_check_alleles: 438 li = _filter_alleles(alleles) 439 loop = _select_and_log(loop, li, log, 440 '{N} SNPs with valid alleles.') 441 alleles = alleles[li] 442 loop['Z2'] = _align_alleles(loop.Z2, alleles) 443 444 loop = loop.drop(['A1', 'A1x', 'A2', 'A2x'], axis=1) 445 _check_ld_condnum(args, log, loop[ref_ld_cnames]) 446 _warn_length(log, loop) 447 return loop
It work well when single trait is analyzed separately. However it fails when two traits are analyzed together using --rg. Could you help me figure out it? Thank you very much!
The details are as follows, Call: ./ldsc.py --out /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Simul_1/zzz.rg_12 --rg /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Simul_1/Simul1.y1.sumstats.gz,/bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Simul_1/Simul1.y2.sumstats.gz --w-ld /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Eur_ld_chr/Chr1 --ref-ld /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Eur_ld_chr/Chr1
Beginning analysis at Fri Dec 28 08:56:41 2018 Reading summary statistics from /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Simul_1/Simul1.y1.sumstats.gz ... Read summary statistics for 400984 SNPs. Reading reference panel LD Score from /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Eur_ld_chr/Chr1 ... Read reference panel LD Scores for 400984 SNPs. Removing partitioned LD Scores with zero variance. Reading regression weight LD Score from /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Eur_ld_chr/Chr1 ... Read regression weight LD Scores for 400984 SNPs. After merging with reference panel LD, 400984 SNPs remain. After merging with regression SNP LD, 400984 SNPs remain. Computing rg for phenotype 2/2 Reading summary statistics from /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Simul_1/Simul1.y2.sumstats.gz ... Read summary statistics for 400984 SNPs. After merging with summary statistics, 400984 SNPs remain. 340046 SNPs with valid alleles. ERROR computing rg for phenotype 2/2, from file /bigdata/jialab/jwei/Project1_human_imap/2018_11_13/Simul_hapmap3/ldsc/Simul_1/Simul1.y2.sumstats.gz. Traceback (most recent call last): File "/rhome/jwei/ldsc/ldscore/sumstats.py", line 409, in estimate_rg loop = _read_other_sumstats(args, log, p2, sumstats, ref_ld_cnames) File "/rhome/jwei/ldsc/ldscore/sumstats.py", line 441, in _read_other_sumstats loop['Z2'] = _align_alleles(loop.Z2, alleles) File "/rhome/jwei/ldsc/ldscore/sumstats.py", line 517, in _align_alleles raise KeyError(msg) KeyError: 'Incompatible alleles in .sumstats files: GCGC. Did you forget to use --merge-alleles with munge_sumstats.py?