JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
170 stars 54 forks source link

Unkown error #82

Open kys21207 opened 4 years ago

kys21207 commented 4 years ago

Hi all, I ran MTAG for 2 traits. Trait 2 was categorical. I met the error after Merge of GWAS summary statistics complete. Please see the below and advise me; (I also attached the log file for your info.)

Thank you,

Kijoung OA.log

2019/11/08/08:43:28 PM ... Merge of GWAS summary statistics complete. Number of SNPs: 6459385 2019/11/08/08:45:13 PM divide by zero encountered in true_divide Traceback (most recent call last): File "mtag-master/mtag.py", line 1567, in mtag(args) File "mtag-master/mtag.py", line 1339, in mtag Zs , Ns ,Fs, res_temp, DATA, N_raw = extract_gwas_sumstats(DATA,args,list(np.arange(args.P))) File "mtag-master/mtag.py", line 590, in extract_gwas_sumstats Ns = 1 / np.square(SEs) FloatingPointError: divide by zero encountered in true_divide 2019/11/08/08:45:13 PM Analysis terminated from error at Fri Nov 8 20:45:13 2019 2019/11/08/08:45:13 PM Total time elapsed: 41.0m:47.66s

paturley commented 4 years ago

This looks like you are getting an divide-by-zero error in a step where you are calculating 1 over the standard error squared. Are you sure that all of the SE entries are non-zero?

On Sat, Nov 9, 2019 at 1:06 PM Kijoung Song notifications@github.com wrote:

Hi all, I ran MTAG for 2 traits. Trait 2 was categorical. I met the error after Merge of GWAS summary statistics complete. Please see the below and advise me; (I also attached the log file for your info.)

Thank you,

Kijoung OA.log https://github.com/omeed-maghzian/mtag/files/3827554/OA.log

2019/11/08/08:43:28 PM ... Merge of GWAS summary statistics complete. Number of SNPs: 6459385 2019/11/08/08:45:13 PM divide by zero encountered in true_divide Traceback (most recent call last): File "mtag-master/mtag.py", line 1567, in mtag(args) File "mtag-master/mtag.py", line 1339, in mtag Zs , Ns ,Fs, res_temp, DATA, N_raw = extract_gwas_sumstats(DATA,args,list(np.arange(args.P))) File "mtag-master/mtag.py", line 590, in extract_gwas_sumstats Ns = 1 / np.square(SEs) FloatingPointError: divide by zero encountered in true_divide 2019/11/08/08:45:13 PM Analysis terminated from error at Fri Nov 8 20:45:13 2019 2019/11/08/08:45:13 PM Total time elapsed: 41.0m:47.66s

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/omeed-maghzian/mtag/issues/82?email_source=notifications&email_token=AFBUB5JVCODZF7HIQDYWJHLQS33YXA5CNFSM4JLI5AKKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HYGCSZA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5N56FVQBIDC7U6CY4TQS33YXANCNFSM4JLI5AKA .

kys21207 commented 4 years ago

Thank you. I got it.

kys21207 commented 4 years ago

Sorry.. The same error still occurs. I double-checked that all of the SE entries are non-zero.

kys21207 commented 4 years ago

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multi-trait Analysis of GWAS <> Version: 1.0.8 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: maghzian@nber.org <> All other correspondence: paturley@broadinstitute.org <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mtag.py \ --force \ --stream-stdout \ --n-min 0.0 \ --use-beta-se \ --sumstats data2.txt,data1.txt \ --out mtag-master/OA

Beginning MTAG analysis... MTAG will use the provided BETA/SE columns for analyses. Read in Trait 1 summary statistics (7701538 SNPs) from data2.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. Read 7701538 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 7701538 SNPs remain. Removed 0 SNPs with duplicated rs numbers (7701538 SNPs remain). Removed 0 SNPs with N < 0.0 (7701538 SNPs remain). Median value of SIGNED_SUMSTAT was -0.000300045009002, which seems sensible. Dropping snps with null values

Metadata: Mean chi^2 = 1.059 Lambda GC = 1.07 Max chi^2 = 26.599 0 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Mon Nov 11 20:50:44 2019 Total time elapsed: 1.0m:46.1s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 1 complete. SNPs remaining: 7701538 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Read in Trait 2 summary statistics (96083438 SNPs) from data1.txt ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. Read 96083438 SNPs from --sumstats file. Removed 14831270 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 3801555 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 77450613 SNPs remain. Removed 262295 SNPs with duplicated rs numbers (77188318 SNPs remain). Removed 0 SNPs with N < 0.0 (77188318 SNPs remain). Median value of SIGNED_SUMSTAT was -0.00706316107091, which seems sensible. Dropping snps with null values

Metadata: Mean chi^2 = 1.023 Lambda GC = 1.0 Max chi^2 = 75.273 1391 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Mon Nov 11 21:11:19 2019 Total time elapsed: 17.0m:17.77s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 2 complete. SNPs remaining: 77589511 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Trait 2: Dropped 401193 SNPs for duplicate values in the "snp_name" column Dropped 1190670 SNPs due to strand ambiguity, 6510868 SNPs remain in intersection after merging trait1 Dropped 10721 SNPs due to inconsistent allele pairs from phenotype 2. 6394645 SNPs remain. Flipped the signs of of 55 SNPs to make them consistent with the effect allele orderings of the first trait. Dropped 0 SNPs due to strand ambiguity, 6394645 SNPs remain in intersection after merging trait2 ... Merge of GWAS summary statistics complete. Number of SNPs: 6394645 divide by zero encountered in true_divide Traceback (most recent call last): File "mtag-master/mtag.py", line 1567, in mtag(args) File "mtag-master/mtag.py", line 1339, in mtag Zs , Ns ,Fs, res_temp, DATA, N_raw = extract_gwas_sumstats(DATA,args,list(np.arange(args.P))) File "mtag-master/mtag.py", line 590, in extract_gwas_sumstats Ns = 1 / np.square(SEs) FloatingPointError: divide by zero encountered in true_divide Analysis terminated from error at Mon Nov 11 21:23:56 2019 Total time elapsed: 35.0m:10.91s

paturley commented 4 years ago

Do you get the same error when you use the Z-N option rather than the beta-se option?

On Mon, Nov 11, 2019 at 4:39 PM Kijoung Song notifications@github.com wrote:

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multi-trait Analysis of GWAS <> Version: 1.0.8 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: maghzian@nber.org <> All other correspondence: paturley@broadinstitute.org

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mtag.py --force --stream-stdout --n-min 0.0 --use-beta-se --sumstats data2.txt,data1.txt --out mtag-master/OA

Beginning MTAG analysis... MTAG will use the provided BETA/SE columns for analyses. Read in Trait 1 summary statistics (7701538 SNPs) from data2.txt ...

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. Read 7701538 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 7701538 SNPs remain. Removed 0 SNPs with duplicated rs numbers (7701538 SNPs remain). Removed 0 SNPs with N < 0.0 (7701538 SNPs remain). Median value of SIGNED_SUMSTAT was -0.000300045009002, which seems sensible. Dropping snps with null values

Metadata: Mean chi^2 = 1.059 Lambda GC = 1.07 Max chi^2 = 26.599 0 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Mon Nov 11 20:50:44 2019 Total time elapsed: 1.0m:46.1s

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 1 complete. SNPs remaining: 7701538

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Read in Trait 2 summary statistics (96083438 SNPs) from data1.txt ...

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. Read 96083438 SNPs from --sumstats file. Removed 14831270 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 3801555 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 77450613 SNPs remain. Removed 262295 SNPs with duplicated rs numbers (77188318 SNPs remain). Removed 0 SNPs with N < 0.0 (77188318 SNPs remain). Median value of SIGNED_SUMSTAT was -0.00706316107091, which seems sensible. Dropping snps with null values

Metadata: Mean chi^2 = 1.023 Lambda GC = 1.0 Max chi^2 = 75.273 1391 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Mon Nov 11 21:11:19 2019 Total time elapsed: 17.0m:17.77s

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 2 complete. SNPs remaining: 77589511

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Trait 2: Dropped 401193 SNPs for duplicate values in the "snp_name" column Dropped 1190670 SNPs due to strand ambiguity, 6510868 SNPs remain in intersection after merging trait1 Dropped 10721 SNPs due to inconsistent allele pairs from phenotype 2. 6394645 SNPs remain. Flipped the signs of of 55 SNPs to make them consistent with the effect allele orderings of the first trait. Dropped 0 SNPs due to strand ambiguity, 6394645 SNPs remain in intersection after merging trait2 ... Merge of GWAS summary statistics complete. Number of SNPs: 6394645 divide by zero encountered in true_divide Traceback (most recent call last): File "mtag-master/mtag.py", line 1567, in mtag(args) File "mtag-master/mtag.py", line 1339, in mtag Zs , Ns ,Fs, res_temp, DATA, N_raw = extract_gwas_sumstats(DATA,args,list(np.arange(args.P))) File "mtag-master/mtag.py", line 590, in extract_gwas_sumstats Ns = 1 / np.square(SEs) FloatingPointError: divide by zero encountered in true_divide Analysis terminated from error at Mon Nov 11 21:23:56 2019 Total time elapsed: 35.0m:10.91s

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/omeed-maghzian/mtag/issues/82?email_source=notifications&email_token=AFBUB5MWXZH6P6G7LBGQO6TQTHGJFA5CNFSM4JLI5AKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDYHBPA#issuecomment-552628412, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5OGPERZOFRBBMQVRF3QTHGJFANCNFSM4JLI5AKA .

MarioGuCBMR commented 4 years ago

Hi all,

I have the same error in the same conditions. I tried using the z option, but then it says that the SUMSTAT-SIGNED median is not close to 0. To calculate the z-score I used beta/se from the respective GWAS as an approximation since I didn't have z-scores.

This happens to me when my number of SNPs is very large. I have 27.000.000, so I played with dimensions. Around 300.000 SNPs it works fine, but more than that raises this problem always. I get the SNPs randomly, so it might also be an issue with random sampling. Nonetheless, none of my SEs are 0.

Shicheng-Guo commented 3 years ago

Hi All,

Looks like the issues still there. Anyone figure out the reason and the solution to solve the bugs?

Thanks

Shicheng

paturley commented 3 years ago

Hi Shicheng,

Apologies for how slow we've been at responding to this. We have been teaching an intensive two-week course, and I have let a few things slide. We will try to get to this in the next few days.

Best, Patrick

On Mon, Aug 16, 2021 at 11:08 AM Shicheng Guo @.***> wrote:

Hi All,

Looks like the issues still there. Anyone figure out the reason and the solution to solve the bugs?

Thanks

Shicheng

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/82#issuecomment-899587610, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5PKQHPJ4OL5J6YPPG3T5ESWPANCNFSM4JLI5AKA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

Shicheng-Guo commented 3 years ago

Thank you Patrick! it will be very helpful!! Shicheng

JonJala commented 3 years ago

I would guess that somehow something with a frequency of 0.0 or 1.0 is getting through the filters, since line 588 runs fine but an error is thrown on line 590 (and both are dividing by SEs, so they must be fine initially). We'll try to look into what's going on with the filtering, which (to our knowledge) has been working fine, but if you want to try something in the meantime , you could try double-checking if you have SNPs with frequencies of 0.0 or 1.0 and dropping those.

JonJala commented 3 years ago

(or perhaps it's a NaN in the frequency column, might want to check for those, as well)

Shicheng-Guo commented 3 years ago

Sure. Thanks, Jonathan. I will double-check the above two situations you mentioned and share feedback with you.

Shicheng