JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
169 stars 54 forks source link

Warning: max number of iterations reached in adjustment procedure. Sigma matrix used is still non-positive-definite #183

Open nadineparker opened 1 year ago

nadineparker commented 1 year ago

Hi, when using MTAG for a particular set of sumstats I continually get the following warning message:

"Warning: max number of iterations reached in adjustment procedure. Sigma matrix used is still non-positive-definite."

I am wondering what aspect of the input summary statistics can cause this warning to occur? Of note, the resulting MTAG summary statistics perform poorly in PRS prediction analyses (as good or worse than the original sumstats). Below is an example log file print out. From this it is clear the OMEGA parameter is not estimated well.

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multi-trait Analysis of GWAS <> Version: 1.0.8 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: jjala.ssgac@gmail.com <> All other correspondence: paturley@broadinstitute.org <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mtag.py \ --stream-stdout \ --n-min 0.0 \ --sumstats BIP.mtaginput,SCZ.mtaginput \ --out BIP_vs_SCZ

2023/07/23/07:17:30 PM Beginning MTAG analysis... 2023/07/23/07:17:30 PM MTAG will use the Z column for analyses. 2023/07/23/07:17:53 PM Read in Trait 1 summary statistics (10735487 SNPs) from BIP.mtaginput ... 2023/07/23/07:17:53 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2023/07/23/07:17:53 PM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2023/07/23/07:17:53 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2023/07/23/07:17:53 PM Interpreting column names as follows: 2023/07/23/07:17:53 PM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats. se: Standard errors of BETA coefficients

2023/07/23/07:17:53 PM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2023/07/23/07:18:12 PM Read 10735487 SNPs from --sumstats file. Removed 1 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 1806888 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8928598 SNPs remain. 2023/07/23/07:18:18 PM Removed 0 SNPs with duplicated rs numbers (8928598 SNPs remain). 2023/07/23/07:18:18 PM Removed 0 SNPs with N < 0.0 (8928598 SNPs remain). 2023/07/23/07:23:18 PM Median value of SIGNED_SUMSTAT was 0.0102773568458, which seems sensible. 2023/07/23/07:23:18 PM Dropping snps with null values 2023/07/23/07:23:18 PM Metadata: 2023/07/23/07:23:19 PM Mean chi^2 = 1.342 2023/07/23/07:23:19 PM Lambda GC = 1.23 2023/07/23/07:23:19 PM Max chi^2 = 76.909 2023/07/23/07:23:19 PM 2842 Genome-wide significant SNPs (some may have been removed by filtering). 2023/07/23/07:23:19 PM Conversion finished at Sun Jul 23 19:23:19 2023 2023/07/23/07:23:19 PM Total time elapsed: 5.0m:26.69s 2023/07/23/07:23:30 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2023/07/23/07:23:30 PM Munging of Trait 1 complete. SNPs remaining: 8928598 2023/07/23/07:23:30 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2023/07/23/07:24:07 PM Read in Trait 2 summary statistics (10988893 SNPs) from SCZ.mtaginput ... 2023/07/23/07:24:07 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2023/07/23/07:24:07 PM Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2023/07/23/07:24:07 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2023/07/23/07:24:07 PM Interpreting column names as follows: 2023/07/23/07:24:07 PM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats. se: Standard errors of BETA coefficients

2023/07/23/07:24:07 PM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2023/07/23/07:24:26 PM Read 10988893 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 10988893 SNPs remain. 2023/07/23/07:24:32 PM Removed 0 SNPs with duplicated rs numbers (10988893 SNPs remain). 2023/07/23/07:24:33 PM Removed 0 SNPs with N < 0.0 (10988893 SNPs remain). 2023/07/23/07:30:45 PM Median value of SIGNED_SUMSTAT was 0.00551461015479, which seems sensible. 2023/07/23/07:30:45 PM Dropping snps with null values 2023/07/23/07:30:46 PM Metadata: 2023/07/23/07:30:46 PM Mean chi^2 = 1.589 2023/07/23/07:30:47 PM Lambda GC = 1.352 2023/07/23/07:30:47 PM Max chi^2 = 166.774 2023/07/23/07:30:47 PM 20255 Genome-wide significant SNPs (some may have been removed by filtering). 2023/07/23/07:30:47 PM Conversion finished at Sun Jul 23 19:30:47 2023 2023/07/23/07:30:47 PM Total time elapsed: 6.0m:40.1s 2023/07/23/07:30:58 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2023/07/23/07:30:58 PM Munging of Trait 2 complete. SNPs remaining: 10988893 2023/07/23/07:30:58 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2023/07/23/07:31:22 PM Dropped 0 SNPs due to strand ambiguity, 8928598 SNPs remain in intersection after merging trait1 2023/07/23/07:31:46 PM Dropped 2 SNPs due to inconsistent allele pairs from phenotype 2. 7342191 SNPs remain. 2023/07/23/07:31:52 PM Dropped 0 SNPs due to strand ambiguity, 7342191 SNPs remain in intersection after merging trait2 2023/07/23/07:31:52 PM ... Merge of GWAS summary statistics complete. Number of SNPs: 7342191 2023/07/23/07:32:06 PM Using 7342191 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) 2023/07/23/07:32:06 PM Estimating sigma.. 2023/07/23/07:33:13 PM Checking for positive definiteness .. 2023/07/23/07:33:13 PM Sigma hat: [[ 1.014 0.178] [ 0.178 1.037]] 2023/07/23/07:33:13 PM Beginning estimation of Omega ... 2023/07/23/07:33:13 PM Using GMM estimator of Omega .. 2023/07/23/07:33:14 PM Checking for positive definiteness .. 2023/07/23/07:33:14 PM matrix is not positive definite, performing adjustment.. 2023/07/23/07:33:14 PM Warning: max number of iterations reached in adjustment procedure. Sigma matrix used is still non-positive-definite. 2023/07/23/07:33:14 PM Completed estimation of Omega ... 2023/07/23/07:33:14 PM Beginning MTAG calculations... 2023/07/23/07:34:39 PM ... Completed MTAG calculations. 2023/07/23/07:34:39 PM Writing Phenotype 1 to file ... 2023/07/23/07:35:31 PM Writing Phenotype 2 to file ... 2023/07/23/07:36:23 PM Summary of MTAG results:

Trait # SNPs used N (max) N (mean) GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 ...ts/BIP.mtaginput 7342191 99336 85219 1.393 1.326 82403
2 ...ts/SCZ.mtaginput 7342191 116561 116561 1.701 1.600 99660

Estimated Omega: [[ -1.290e-06 1.313e-10] [ 1.313e-10 6.240e-06]]

(Correlation): [[ nan nan] [ nan 1.]]

Estimated Sigma: [[ 1.014 0.178] [ 0.178 1.037]]

(Correlation): [[ 1. 0.173] [ 0.173 1. ]]

MTAG weight factors: (average across SNPs) [ 0.843 0.835]

paturley commented 1 year ago

Yeah, this is confusing to me. The diagonal elements of Omega shouldn't be negative. This sometimes can occur when the sample size is small (which doesn't seem to be the case given your large mean chi2 stat) or when there is a large amount of popstrat (which doesn't seem to be the case since the diagonal elements of Sigma are smaller than the mean chi2). When you run LDSC on the trait one summary statistics, what do you get for the h2 and intercept?

On Wed, Aug 16, 2023 at 11:07 AM nadineparker @.***> wrote:

Hi, when using MTAG for a particular set of sumstats I continually get the following warning message:

"Warning: max number of iterations reached in adjustment procedure. Sigma matrix used is still non-positive-definite."

I am wondering what aspect of the input summary statistics can cause this warning to occur? Of note, the resulting MTAG summary statistics perform poorly in PRS prediction analyses (as good or worse than the original sumstats). Below is an example log file print out. From this it is clear the OMEGA parameter is not estimated well.

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multi-trait Analysis of GWAS <> Version: 1.0.8 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: @. <> All other correspondence: @.

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mtag.py --stream-stdout --n-min 0.0 --sumstats BIP.mtaginput,SCZ.mtaginput --out BIP_vs_SCZ

2023/07/23/07:17:30 PM Beginning MTAG analysis... 2023/07/23/07:17:30 PM MTAG will use the Z column for analyses. 2023/07/23/07:17:53 PM Read in Trait 1 summary statistics (10735487 SNPs) from BIP.mtaginput ... 2023/07/23/07:17:53 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2023/07/23/07:17:53 PM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2023/07/23/07:17:53 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2023/07/23/07:17:53 PM Interpreting column names as follows: 2023/07/23/07:17:53 PM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats. se: Standard errors of BETA coefficients

2023/07/23/07:17:53 PM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2023/07/23/07:18:12 PM Read 10735487 SNPs from --sumstats file. Removed 1 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 1806888 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8928598 SNPs remain. 2023/07/23/07:18:18 PM Removed 0 SNPs with duplicated rs numbers (8928598 SNPs remain). 2023/07/23/07:18:18 PM Removed 0 SNPs with N < 0.0 (8928598 SNPs remain). 2023/07/23/07:23:18 PM Median value of SIGNED_SUMSTAT was 0.0102773568458, which seems sensible. 2023/07/23/07:23:18 PM Dropping snps with null values 2023/07/23/07:23:18 PM Metadata: 2023/07/23/07:23:19 PM Mean chi^2 = 1.342 2023/07/23/07:23:19 PM Lambda GC = 1.23 2023/07/23/07:23:19 PM Max chi^2 = 76.909 2023/07/23/07:23:19 PM 2842 Genome-wide significant SNPs (some may have been removed by filtering). 2023/07/23/07:23:19 PM Conversion finished at Sun Jul 23 19:23:19 2023 2023/07/23/07:23:19 PM Total time elapsed: 5.0m:26.69s 2023/07/23/07:23:30 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2023/07/23/07:23:30 PM Munging of Trait 1 complete. SNPs remaining: 8928598 2023/07/23/07:23:30 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2023/07/23/07:24:07 PM Read in Trait 2 summary statistics (10988893 SNPs) from SCZ.mtaginput ... 2023/07/23/07:24:07 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2023/07/23/07:24:07 PM Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2023/07/23/07:24:07 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2023/07/23/07:24:07 PM Interpreting column names as follows: 2023/07/23/07:24:07 PM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats. se: Standard errors of BETA coefficients

2023/07/23/07:24:07 PM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2023/07/23/07:24:26 PM Read 10988893 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 10988893 SNPs remain. 2023/07/23/07:24:32 PM Removed 0 SNPs with duplicated rs numbers (10988893 SNPs remain). 2023/07/23/07:24:33 PM Removed 0 SNPs with N < 0.0 (10988893 SNPs remain). 2023/07/23/07:30:45 PM Median value of SIGNED_SUMSTAT was 0.00551461015479, which seems sensible. 2023/07/23/07:30:45 PM Dropping snps with null values 2023/07/23/07:30:46 PM Metadata: 2023/07/23/07:30:46 PM Mean chi^2 = 1.589 2023/07/23/07:30:47 PM Lambda GC = 1.352 2023/07/23/07:30:47 PM Max chi^2 = 166.774 2023/07/23/07:30:47 PM 20255 Genome-wide significant SNPs (some may have been removed by filtering). 2023/07/23/07:30:47 PM Conversion finished at Sun Jul 23 19:30:47 2023 2023/07/23/07:30:47 PM Total time elapsed: 6.0m:40.1s 2023/07/23/07:30:58 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2023/07/23/07:30:58 PM Munging of Trait 2 complete. SNPs remaining: 10988893 2023/07/23/07:30:58 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2023/07/23/07:31:22 PM Dropped 0 SNPs due to strand ambiguity, 8928598 SNPs remain in intersection after merging trait1 2023/07/23/07:31:46 PM Dropped 2 SNPs due to inconsistent allele pairs from phenotype 2. 7342191 SNPs remain. 2023/07/23/07:31:52 PM Dropped 0 SNPs due to strand ambiguity, 7342191 SNPs remain in intersection after merging trait2 2023/07/23/07:31:52 PM ... Merge of GWAS summary statistics complete. Number of SNPs: 7342191 2023/07/23/07:32:06 PM Using 7342191 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) 2023/07/23/07:32:06 PM Estimating sigma.. 2023/07/23/07:33:13 PM Checking for positive definiteness .. 2023/07/23/07:33:13 PM Sigma hat: [[ 1.014 0.178] [ 0.178 1.037]] 2023/07/23/07:33:13 PM Beginning estimation of Omega ... 2023/07/23/07:33:13 PM Using GMM estimator of Omega .. 2023/07/23/07:33:14 PM Checking for positive definiteness .. 2023/07/23/07:33:14 PM matrix is not positive definite, performing adjustment.. 2023/07/23/07:33:14 PM Warning: max number of iterations reached in adjustment procedure. Sigma matrix used is still non-positive-definite. 2023/07/23/07:33:14 PM Completed estimation of Omega ... 2023/07/23/07:33:14 PM Beginning MTAG calculations... 2023/07/23/07:34:39 PM ... Completed MTAG calculations. 2023/07/23/07:34:39 PM Writing Phenotype 1 to file ... 2023/07/23/07:35:31 PM Writing Phenotype 2 to file ... 2023/07/23/07:36:23 PM Summary of MTAG results:

Trait # SNPs used N (max) N (mean) GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 ...ts/BIP.mtaginput 7342191 99336 85219 1.393 1.326 82403 2 ...ts/SCZ.mtaginput 7342191 116561 116561 1.701 1.600 99660

Estimated Omega: [[ -1.290e-06 1.313e-10] [ 1.313e-10 6.240e-06]]

(Correlation): [[ nan nan] [ nan 1.]]

Estimated Sigma: [[ 1.014 0.178] [ 0.178 1.037]]

(Correlation): [[ 1. 0.173] [ 0.173 1. ]]

MTAG weight factors: (average across SNPs) [ 0.843 0.835]

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/183, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5PAKBPK5FG6ZYAEHSTXVTO2BANCNFSM6AAAAAA3SV5ETM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

nadineparker commented 1 year ago

Hi, agreed this is puzzling. The LDSC h2 = 0.2918 (0.0112) and the intercept = 1.0173 (0.0086).

paturley commented 1 year ago

Hmm. The problem seems to be in the Omega calculation. How comfortable are you with Python? In your local copy, if you can temporarily modify the gmm_omega function on line 659 so that it prints out the Z_outer and sigma_LD, that might give us a hint about what is going on.

On Mon, Aug 21, 2023 at 5:34 AM nadineparker @.***> wrote:

Hi, agreed this is puzzling. The LDSC h2 = 0.2918 (0.0112) and the intercept = 1.0173 (0.0086).

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/183#issuecomment-1685984946, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5JTO4JAPDHAFOVFIPTXWMTS5ANCNFSM6AAAAAA3SV5ETM . You are receiving this because you commented.Message ID: @.***>

samreenzafer commented 10 months ago

@nadineparker Any luck here? My runs give me similar Warnings for sigma

2023/11/02/02:55:58 PM Checking for positive definiteness ..
2023/11/02/02:56:04 PM Checking for positive definiteness ..
2023/11/02/02:56:04 PM matrix is not positive definite, performing adjustment..
2023/11/02/02:56:04 PM Warning: max number of iterations reached in adjustment procedure. Sigma matrix used is still non-positive-definite.
paturley commented 10 months ago

This would be easier to diagnose if you shared the full log file.

Best, Patrick

On Fri, Nov 3, 2023 at 1:58 PM samreenzafer @.***> wrote:

@nadineparker https://github.com/nadineparker Any luck here? My runs give me similar Warnings for sigma

2023/11/02/02:55:58 PM Checking for positive definiteness .. 2023/11/02/02:56:04 PM Checking for positive definiteness .. 2023/11/02/02:56:04 PM matrix is not positive definite, performing adjustment.. 2023/11/02/02:56:04 PM Warning: max number of iterations reached in adjustment procedure. Sigma matrix used is still non-positive-definite.

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/183#issuecomment-1792892567, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5LIFFSYCXSPSZ2NSQLYCUWCTAVCNFSM6AAAAAA3SV5ETOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJSHA4TENJWG4 . You are receiving this because you commented.Message ID: @.***>

nadineparker commented 10 months ago

Apologies for the long delayed follow-up. It turned out that there were a number of duplicated SNPs, one with rsid and the other with a variant ID (chr:pos:A1:A2), in the original (meta-analyzed) GWAS. Our standard pipeline for cleaning summary statistics would choose only one version of the SNP identifier and update dbSNP rsid among other things. My thoughts are that this may have resulted in the selection of SNPs with low sample size in many cases ultimately effecting MTAG estimates due to the additional filter on SNPs (i.e., sample size and outlying estimates). Luckily we were able to re-run the GWAS meta-analysis with harmonized SNP identifiers and this seems to have resolved the issue. The final MTAG run had no warnings in the log file and PRS predictions are typically improved over the GWAS predictions.