JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
170 stars 54 forks source link

Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation. #221

Open luciaC-C opened 1 week ago

luciaC-C commented 1 week ago

Hello,

First, thanks for amazing project and issue resolution.

I'm wondering if the message "Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation" indicates a problem when using MTAG if the rest of the output looks reasonable.

Thank you!

2024/10/15/09:50:46 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multi-trait Analysis of GWAS <> Version: 1.0.8 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: maghzian@nber.org <> All other correspondence: paturley@broadinstitute.org <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mtag.py \ --n-name Neff \ --sumstats trait1,trait2 \ --cores 5 \ --out

2024/10/15/09:50:46 AM Beginning MTAG analysis... 2024/10/15/09:50:46 AM MTAG will use the Z column for analyses. 2024/10/15/09:50:53 AM Read in Trait 1 summary statistics (6010326 SNPs) from trait1 ... 2024/10/15/09:50:53 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2024/10/15/09:50:53 AM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2024/10/15/09:50:53 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2024/10/15/09:50:53 AM Interpreting column names as follows: 2024/10/15/09:50:53 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats.

2024/10/15/09:50:53 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2024/10/15/09:51:00 AM Read 6010326 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 6010326 SNPs remain. 2024/10/15/09:51:03 AM Removed 0 SNPs with duplicated rs numbers (6010326 SNPs remain). 2024/10/15/09:51:03 AM Removed 331211 SNPs with N < 20523.3333333 (5679115 SNPs remain). 2024/10/15/09:51:55 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2024/10/15/09:51:55 AM Dropping snps with null values 2024/10/15/09:51:55 AM Metadata: 2024/10/15/09:51:56 AM Mean chi^2 = 1.102 2024/10/15/09:51:56 AM Lambda GC = 1.097 2024/10/15/09:51:56 AM Max chi^2 = 29.371 2024/10/15/09:51:56 AM 1 Genome-wide significant SNPs (some may have been removed by filtering). 2024/10/15/09:51:56 AM Conversion finished at Tue Oct 15 09:51:56 2024 2024/10/15/09:51:56 AM Total time elapsed: 1.0m:3.43s 2024/10/15/09:52:03 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2024/10/15/09:52:03 AM Munging of Trait 1 complete. SNPs remaining: 5679115 2024/10/15/09:52:03 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2024/10/15/09:52:18 AM Read in Trait 2 summary statistics (5915125 SNPs) from trait2 ... 2024/10/15/09:52:18 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2024/10/15/09:52:18 AM Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2024/10/15/09:52:18 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2024/10/15/09:52:18 AM Interpreting column names as follows: 2024/10/15/09:52:18 AM Neff: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. snpid: Variant ID (e.g., rs number) z: Directional summary statistic as specified by --signed-sumstats.

2024/10/15/09:52:18 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2024/10/15/09:52:24 AM Read 5915125 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 5915125 SNPs remain. 2024/10/15/09:52:27 AM Removed 0 SNPs with duplicated rs numbers (5915125 SNPs remain). 2024/10/15/09:52:28 AM Removed 0 SNPs with N < 28823.0 (5915125 SNPs remain). 2024/10/15/09:53:22 AM Median value of SIGNED_SUMSTAT was -0.0769731, which seems sensible. 2024/10/15/09:53:22 AM Dropping snps with null values 2024/10/15/09:53:22 AM Metadata: 2024/10/15/09:53:23 AM Mean chi^2 = 1.257 2024/10/15/09:53:23 AM Lambda GC = 1.216 2024/10/15/09:53:23 AM Max chi^2 = 38.418 2024/10/15/09:53:23 AM 128 Genome-wide significant SNPs (some may have been removed by filtering). 2024/10/15/09:53:23 AM Conversion finished at Tue Oct 15 09:53:23 2024 2024/10/15/09:53:23 AM Total time elapsed: 1.0m:5.2s 2024/10/15/09:53:30 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2024/10/15/09:53:30 AM Munging of Trait 2 complete. SNPs remaining: 5915125 2024/10/15/09:53:30 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2024/10/15/09:53:41 AM Dropped 842325 SNPs due to strand ambiguity, 4836790 SNPs remain in intersection after merging trait1 2024/10/15/09:53:51 AM Flipped the signs of of 2187326 SNPs to make them consistent with the effect allele orderings of the first trait. 2024/10/15/09:53:53 AM Dropped 0 SNPs due to strand ambiguity, 4024959 SNPs remain in intersection after merging trait2 2024/10/15/09:53:53 AM ... Merge of GWAS summary statistics complete. Number of SNPs: 4024959 2024/10/15/09:54:01 AM Using 4024959 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) 2024/10/15/09:54:01 AM Estimating sigma.. 2024/10/15/09:54:38 AM Checking for positive definiteness .. 2024/10/15/09:54:38 AM Sigma hat: [[1.084 0.003] [0.003 1.031]] 2024/10/15/09:54:38 AM Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation. 2024/10/15/09:54:38 AM Beginning estimation of Omega ... 2024/10/15/09:54:38 AM Using GMM estimator of Omega .. 2024/10/15/09:54:39 AM Checking for positive definiteness .. 2024/10/15/09:54:39 AM matrix is not positive definite, performing adjustment.. 2024/10/15/09:54:39 AM Completed in 0 iterations 2024/10/15/09:54:39 AM Completed estimation of Omega ... 2024/10/15/09:54:39 AM Beginning MTAG calculations... 2024/10/15/09:54:58 AM ... Completed MTAG calculations. 2024/10/15/09:54:58 AM Writing Phenotype 1 to file ... 2024/10/15/09:55:23 AM Writing Phenotype 2 to file ... 2024/10/15/09:55:48 AM Summary of MTAG results:

Trait # SNPs used ... MTAG mean chi^2 GWAS equiv. (max) N 1 ...trait1 4024959 ... 1.268 476858
2 ...trait2 4024959 ... 1.273 48210

[2 rows x 7 columns]

Estimated Omega: [[6.135e-07 1.895e-06] [1.895e-06 5.970e-06]]

(Correlation): [[1. 0.99] [0.99 1. ]]

Estimated Sigma: [[1.084 0.003] [0.003 1.031]]

(Correlation): [[1. 0.002] [0.002 1. ]]

MTAG weight factors: (average across SNPs) [0.369 1.138]

2024/10/15/09:55:48 AM
2024/10/15/09:55:48 AM MTAG results saved to file. 2024/10/15/09:55:48 AM MTAG complete. Time elapsed: 5.0m:2.1940908432s

paturley commented 1 week ago

I agree that your estimates of Omega and Sigma seem really good, so if there was ever a case for a low mean chi2 to not be a problem, this would be it. However, the problem with a low mean chi2 is that you have an imprecise estimate of the correlation between phenotypes. So while it looks like the estimated correlation is very high (0.99), my worry would be that the actual correlation could be lower than this unless you have some prior reason to believe the true correlation is nearly one.

On Sun, Oct 20, 2024 at 7:45 PM luciaC-C @.***> wrote:

Hello,

First, thanks for amazing project and issue resolution.

I'm wondering if the message "Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation" indicates a problem when using MTAG if the rest of the output looks reasonable.

Thank you!

2024/10/15/09:50:46 AM

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multi-trait Analysis of GWAS <> Version: 1.0.8 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: @. <> All other correspondence: @.

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mtag.py --n-name Neff --sumstats trait1,trait2 --cores 5 --out

2024/10/15/09:50:46 AM Beginning MTAG analysis... 2024/10/15/09:50:46 AM MTAG will use the Z column for analyses. 2024/10/15/09:50:53 AM Read in Trait 1 summary statistics (6010326 SNPs) from trait1 ... 2024/10/15/09:50:53 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2024/10/15/09:50:53 AM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2024/10/15/09:50:53 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2024/10/15/09:50:53 AM Interpreting column names as follows: 2024/10/15/09:50:53 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats.

2024/10/15/09:50:53 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2024/10/15/09:51:00 AM Read 6010326 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 6010326 SNPs remain. 2024/10/15/09:51:03 AM Removed 0 SNPs with duplicated rs numbers (6010326 SNPs remain). 2024/10/15/09:51:03 AM Removed 331211 SNPs with N < 20523.3333333 (5679115 SNPs remain). 2024/10/15/09:51:55 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2024/10/15/09:51:55 AM Dropping snps with null values 2024/10/15/09:51:55 AM Metadata: 2024/10/15/09:51:56 AM Mean chi^2 = 1.102 2024/10/15/09:51:56 AM Lambda GC = 1.097 2024/10/15/09:51:56 AM Max chi^2 = 29.371 2024/10/15/09:51:56 AM 1 Genome-wide significant SNPs (some may have been removed by filtering). 2024/10/15/09:51:56 AM Conversion finished at Tue Oct 15 09:51:56 2024 2024/10/15/09:51:56 AM Total time elapsed: 1.0m:3.43s 2024/10/15/09:52:03 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2024/10/15/09:52:03 AM Munging of Trait 1 complete. SNPs remaining: 5679115 2024/10/15/09:52:03 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2024/10/15/09:52:18 AM Read in Trait 2 summary statistics (5915125 SNPs) from trait2 ... 2024/10/15/09:52:18 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2024/10/15/09:52:18 AM Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2024/10/15/09:52:18 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2024/10/15/09:52:18 AM Interpreting column names as follows: 2024/10/15/09:52:18 AM Neff: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. snpid: Variant ID (e.g., rs number) z: Directional summary statistic as specified by --signed-sumstats.

2024/10/15/09:52:18 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2024/10/15/09:52:24 AM Read 5915125 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 5915125 SNPs remain. 2024/10/15/09:52:27 AM Removed 0 SNPs with duplicated rs numbers (5915125 SNPs remain). 2024/10/15/09:52:28 AM Removed 0 SNPs with N < 28823.0 (5915125 SNPs remain). 2024/10/15/09:53:22 AM Median value of SIGNED_SUMSTAT was -0.0769731, which seems sensible. 2024/10/15/09:53:22 AM Dropping snps with null values 2024/10/15/09:53:22 AM Metadata: 2024/10/15/09:53:23 AM Mean chi^2 = 1.257 2024/10/15/09:53:23 AM Lambda GC = 1.216 2024/10/15/09:53:23 AM Max chi^2 = 38.418 2024/10/15/09:53:23 AM 128 Genome-wide significant SNPs (some may have been removed by filtering). 2024/10/15/09:53:23 AM Conversion finished at Tue Oct 15 09:53:23 2024 2024/10/15/09:53:23 AM Total time elapsed: 1.0m:5.2s 2024/10/15/09:53:30 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2024/10/15/09:53:30 AM Munging of Trait 2 complete. SNPs remaining: 5915125 2024/10/15/09:53:30 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2024/10/15/09:53:41 AM Dropped 842325 SNPs due to strand ambiguity, 4836790 SNPs remain in intersection after merging trait1 2024/10/15/09:53:51 AM Flipped the signs of of 2187326 SNPs to make them consistent with the effect allele orderings of the first trait. 2024/10/15/09:53:53 AM Dropped 0 SNPs due to strand ambiguity, 4024959 SNPs remain in intersection after merging trait2 2024/10/15/09:53:53 AM ... Merge of GWAS summary statistics complete. Number of SNPs: 4024959 2024/10/15/09:54:01 AM Using 4024959 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) 2024/10/15/09:54:01 AM Estimating sigma.. 2024/10/15/09:54:38 AM Checking for positive definiteness .. 2024/10/15/09:54:38 AM Sigma hat: [[1.084 0.003] [0.003 1.031]] 2024/10/15/09:54:38 AM Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation. 2024/10/15/09:54:38 AM Beginning estimation of Omega ... 2024/10/15/09:54:38 AM Using GMM estimator of Omega .. 2024/10/15/09:54:39 AM Checking for positive definiteness .. 2024/10/15/09:54:39 AM matrix is not positive definite, performing adjustment.. 2024/10/15/09:54:39 AM Completed in 0 iterations 2024/10/15/09:54:39 AM Completed estimation of Omega ... 2024/10/15/09:54:39 AM Beginning MTAG calculations... 2024/10/15/09:54:58 AM ... Completed MTAG calculations. 2024/10/15/09:54:58 AM Writing Phenotype 1 to file ... 2024/10/15/09:55:23 AM Writing Phenotype 2 to file ... 2024/10/15/09:55:48 AM Summary of MTAG results:

Trait # SNPs used ... MTAG mean chi^2 GWAS equiv. (max) N 1 ...trait1 4024959 ... 1.268 476858 2 ...trait2 4024959 ... 1.273 48210

[2 rows x 7 columns]

Estimated Omega: [[6.135e-07 1.895e-06] [1.895e-06 5.970e-06]]

(Correlation): [[1. 0.99] [0.99 1. ]]

Estimated Sigma: [[1.084 0.003] [0.003 1.031]]

(Correlation): [[1. 0.002] [0.002 1. ]]

MTAG weight factors: (average across SNPs) [0.369 1.138]

2024/10/15/09:55:48 AM 2024/10/15/09:55:48 AM MTAG results saved to file. 2024/10/15/09:55:48 AM MTAG complete. Time elapsed: 5.0m:2.1940908432s

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/221, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5OAN65POTH4MBNQZFLZ4Q6CHAVCNFSM6AAAAABQI7NUF2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGYYDAOJXGYZTOMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>