JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
170 stars 54 forks source link

KeyError: 'GGTGGGTG' #100

Open huangjx1001 opened 4 years ago

huangjx1001 commented 4 years ago

Hi,I am running mtag using 1trait, i met the following issue: Trait 1: Dropped 1971 SNPs for duplicate values in the "snp_name" column Dropped 920120 SNPs due to strand ambiguity, 4996132 SNPs remain in intersection after merging trait1 ... Merge of GWAS summary statistics complete. Number of SNPs: 4996132 Using 4996132 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) Estimating sigma.. 'GGTGGGTG' Traceback (most recent call last): File "/BIGDATA1/gzhmu_jli_1/software/mtag-master/mtag.py", line 1567, in mtag(args) File "/BIGDATA1/gzhmu_jli_1/software/mtag-master/mtag.py", line 1351, in mtag args.sigma_hat = estimate_sigma(DATA[not_SA], args) File "/BIGDATA1/gzhmu_jli_1/software/mtag-master/mtag.py", line 468, in estimate_sigma rg_results = sumstats_sig.estimate_rg(args_ldsc_rg, Logger_to_Logging()) File "/BIGDATA1/gzhmu_jli_1/software/mtag-master/ldsc_mod/ldscore/sumstats.py", line 442, in estimate_rg loop = _read_other_sumstats(args, log, None, sumstats, ref_ld_cnames,sumstats2=p2) File "/BIGDATA1/gzhmu_jli_1/software/mtag-master/ldsc_mod/ldscore/sumstats.py", line 494, in _read_other_sumstats loop['Z2'] = _align_alleles(loop.Z2, alleles) File "/BIGDATA1/gzhmu_jli_1/software/mtag-master/ldsc_mod/ldscore/sumstats.py", line 567, in _align_alleles z *= (-1) * alleles.apply(lambda y: FLIP_ALLELES[y]) File "/BIGDATA1/gzhmu_jli_1/.conda/envs/myenv_py2.7/lib/python2.7/site-packages/pandas/core/series.py", line 3591, in apply mapped = lib.map_infer(values, f, convert=convert_dtype) File "pandas/_libs/lib.pyx", line 2217, in pandas._libs.lib.map_infer File "/BIGDATA1/gzhmu_jli_1/software/mtag-master/ldsc_mod/ldscore/sumstats.py", line 567, in z = (-1) ** alleles.apply(lambda y: FLIP_ALLELES[y]) KeyError: 'GGTGGGTG' Can anyone help? thx a lot!

paturley commented 4 years ago

Have you verified that the reference and alternate alleles in your data only contain A, T, C, and G for every SNP?

On Sat, Aug 1, 2020, 2:20 AM huangjx1001 notifications@github.com wrote:

Hi,I am running mtag using 1trait, i met the following issue: Trait 1: Dropped 1971 SNPs for duplicate values in the "snp_name" column Dropped 920120 SNPs due to strand ambiguity, 4996132 SNPs remain in intersection after merging trait1 ... Merge of GWAS summary statistics complete. Number of SNPs: 4996132 Using 4996132 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) Estimating sigma.. 'GGTGGGTG' Traceback (most recent call last): File "/BIGDATA1/gzhmu_jli_1/software/mtag-master/mtag.py", line 1567, in mtag(args) File "/BIGDATA1/gzhmu_jli_1/software/mtag-master/mtag.py", line 1351, in mtag args.sigma_hat = estimate_sigma(DATA[not_SA], args) File "/BIGDATA1/gzhmu_jli_1/software/mtag-master/mtag.py", line 468, in estimate_sigma rg_results = sumstats_sig.estimate_rg(args_ldsc_rg, Logger_to_Logging()) File "/BIGDATA1/gzhmu_jli_1/software/mtag-master/ldsc_mod/ldscore/sumstats.py", line 442, in estimate_rg loop = _read_other_sumstats(args, log, None, sumstats, ref_ld_cnames,sumstats2=p2) File "/BIGDATA1/gzhmu_jli_1/software/mtag-master/ldsc_mod/ldscore/sumstats.py", line 494, in _read_other_sumstats loop['Z2'] = _align_alleles(loop.Z2, alleles) File "/BIGDATA1/gzhmu_jli_1/software/mtag-master/ldsc_mod/ldscore/sumstats.py", line 567, in _align_alleles z *= (-1) * alleles.apply(lambda y: FLIP_ALLELES[y]) File "/BIGDATA1/gzhmu_jli_1/.conda/envs/myenv_py2.7/lib/python2.7/site-packages/pandas/core/series.py", line 3591, in apply mapped = lib.map_infer(values, f, convert=convert_dtype) File "pandas/_libs/lib.pyx", line 2217, in pandas._libs.lib.map_infer File "/BIGDATA1/gzhmu_jli_1/software/mtag-master/ldsc_mod/ldscore/sumstats.py", line 567, in z = (-1) ** alleles.apply(lambda y: FLIP_ALLELES[y]) KeyError: 'GGTGGGTG' Can anyone help? thx a lot!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5NKAWCBECTYVC5C4ULR6OX3XANCNFSM4PRPEOTQ .

huangjx1001 commented 4 years ago

Sorry for the delayed reply. I checked my data, it only contain A, T, C, and G for every SNP, but i found one reference allele showed "AGGGTGGGTGGCGAGGGTCCCCTCACGCG", then I deleted this snp, after that I mtag again and the log showed the similar error "KeyError: 'GGGTGGGT'". I still confuse for this issue.

paturley commented 4 years ago

This still looks like an error with the reference or alternate alleles in your data. Have you also checked the alternate allele column? Did you look at the reference and alternate allele in your reference data set?

On Sun, Aug 2, 2020 at 5:50 AM huangjx1001 notifications@github.com wrote:

Sorry for the delayed reply. I checked my data, it only contain A, T, C, and G for every SNP, but i found one reference allele showed "AGGGTGGGTGGCGAGGGTCCCCTCACGCG", then I deleted this snp, after that I mtag again and the log showed the similar error "KeyError: 'GGGTGGGT'". I still confuse for this issue.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/100#issuecomment-667652745, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5NA34FIBGKEBDMKNHTR6UZHJANCNFSM4PRPEOTQ .

huangjx1001 commented 4 years ago

You are right! I know where the problem is. Thanks a lot!

paturley commented 4 years ago

No problem. Glad to help.

On Wed, Aug 5, 2020 at 11:10 PM huangjx1001 notifications@github.com wrote:

You are right! I know where the problem is. Thanks a lot!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/100#issuecomment-669656915, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5OG5MHSH5RKTL444Z3R7INKFANCNFSM4PRPEOTQ .

SimonYu668 commented 1 year ago

@huangjx1001 Hi!I have met the same issue, I wonder how you fixed that? Could you tell me, please?

paturley commented 1 year ago

I believe they just searched their summary statistics for SNPs that had anything other than ATC of G as the reference or alternate allele.

On Mon, Mar 20, 2023 at 5:21 AM siminyu0629 @.***> wrote:

@huangjx1001 https://github.com/huangjx1001 Hi!I have met the same issue, I wonder how you fixed that? Could you tell me, please?

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/100#issuecomment-1476129417, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5ISVWA3EVJSYCF2HXDW5BDVPANCNFSM4PRPEOTQ . You are receiving this because you commented.Message ID: @.***>

SimonYu668 commented 1 year ago

Thanks for your rapid reply, I have already check my data and delete the rows contain anything other than A, C, T and G, but the same error happens, the error log shows below:

2023/03/20/08:30:35 PM Estimating sigma.. 2023/03/20/08:31:27 PM 'AAAA' Traceback (most recent call last): File "mtag.py", line 1577, in mtag(args) File "mtag.py", line 1358, in mtag args.sigma_hat = estimate_sigma(DATA[not_SA], args) File "mtag.py", line 472, in estimate_sigma rg_results = sumstats_sig.estimate_rg(args_ldsc_rg, Logger_to_Logging()) File "/Users/simon/Documents/01MedWork/2302_Gut_KS/05_Mtag/03_mtag/ldsc_mod/ldscore/sumstats.py", line 442, in estimate_rg loop = _read_other_sumstats(args, log, None, sumstats, ref_ld_cnames,sumstats2=p2) File "/Users/simon/Documents/01MedWork/2302_Gut_KS/05_Mtag/03_mtag/ldsc_mod/ldscore/sumstats.py", line 494, in _read_other_sumstats loop['Z2'] = _align_alleles(loop.Z2, alleles) File "/Users/simon/Documents/01MedWork/2302_Gut_KS/05_Mtag/03_mtag/ldsc_mod/ldscore/sumstats.py", line 567, in _align_alleles z *= (-1) * alleles.apply(lambda y: FLIP_ALLELES[y]) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/series.py", line 3591, in apply mapped = lib.map_infer(values, f, convert=convert_dtype) File "pandas/_libs/lib.pyx", line 2217, in pandas._libs.lib.map_infer File "/Users/simon/Documents/01MedWork/2302_Gut_KS/05_Mtag/03_mtag/ldsc_mod/ldscore/sumstats.py", line 567, in z = (-1) ** alleles.apply(lambda y: FLIP_ALLELES[y]) KeyError: 'AAAA' 2023/03/20/08:31:27 PM Analysis terminated from error at Mon Mar 20 20:31:27 2023 2023/03/20/08:31:27 PM Total time elapsed: 6.0m:1.5s

Could anyone who solved this problem help ? thanks a lot !!

paturley commented 1 year ago

I'm fairly confident that your error implies that some SNP has a reference or alternate allele of 'AAAA'. I would run a grep on your summary statistics for 'AAAA' just to confirm.

On Mon, Mar 20, 2023 at 11:01 AM siminyu0629 @.***> wrote:

Thanks for your rapid reply, I have already check my data and delete the rows contain anything other than A, C, T and G, but the same error happens, the error log shows below:

2023/03/20/08:30:35 PM Estimating sigma.. 2023/03/20/08:31:27 PM 'AAAA' Traceback (most recent call last): File "mtag.py", line 1577, in mtag(args) File "mtag.py", line 1358, in mtag args.sigma_hat = estimate_sigma(DATA[not_SA], args) File "mtag.py", line 472, in estimate_sigma rg_results = sumstats_sig.estimate_rg(args_ldsc_rg, Logger_to_Logging()) File "/Users/simon/Documents/01MedWork/2302_Gut_KS/05_Mtag/03_mtag/ldsc_mod/ldscore/sumstats.py", line 442, in estimate_rg loop = _read_other_sumstats(args, log, None, sumstats, ref_ld_cnames,sumstats2=p2) File "/Users/simon/Documents/01MedWork/2302_Gut_KS/05_Mtag/03_mtag/ldsc_mod/ldscore/sumstats.py", line 494, in _read_other_sumstats loop['Z2'] = _align_alleles(loop.Z2, alleles) File "/Users/simon/Documents/01MedWork/2302_Gut_KS/05_Mtag/03_mtag/ldsc_mod/ldscore/sumstats.py", line 567, in _align_alleles z *= (-1) * alleles.apply(lambda y: FLIP_ALLELES[y]) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/series.py", line 3591, in apply mapped = lib.map_infer(values, f, convert=convert_dtype) File "pandas/_libs/lib.pyx", line 2217, in pandas._libs.lib.map_infer File "/Users/simon/Documents/01MedWork/2302_Gut_KS/05_Mtag/03_mtag/ldsc_mod/ldscore/sumstats.py", line 567, in z = (-1) ** alleles.apply(lambda y: FLIP_ALLELES[y]) KeyError: 'AAAA' 2023/03/20/08:31:27 PM Analysis terminated from error at Mon Mar 20 20:31:27 2023 2023/03/20/08:31:27 PM Total time elapsed: 6.0m:1.5s

Could anyone who solved this problem help ? thanks a lot !!

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/100#issuecomment-1476393615, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5LY66OPKKRPEIVHMT3W5BWMLANCNFSM4PRPEOTQ . You are receiving this because you commented.Message ID: @.***>

test12138jooh commented 6 months ago

@SimonYu668 Hi!I have met the same issue, Have you fixed that? Does it mean the MTAG can only be applied to the SNV other than indel ?

paturley commented 6 months ago

The current MTAG software can only handle SNVs, though as you saw in another issue, it sounds like it's not too complicated to edit your local instance of MTAG to accept non-SNV data.

On Mon, Apr 22, 2024, 10:47 PM test12138jooh @.***> wrote:

@SimonYu668 https://github.com/SimonYu668 Hi!I have met the same issue, Have you fixed that? Does it mean the MTAG can only be applied to the SNV other than indel ?

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/100#issuecomment-2071309635, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5O2DSJXSFWLIK3UN5TY6XDULAVCNFSM4PRPEOT2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBXGEZTAOJWGM2Q . You are receiving this because you commented.Message ID: @.***>