JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
171 stars 54 forks source link

Interpretation of results with total sample overlap #123

Open annelundager opened 3 years ago

annelundager commented 3 years ago

Dear MTAG-knowledged

I'm investigating the genetics behind 4 similar traits through 4 GWAS. The 4 traits are kind of the same, but are calculated with different mathematics. My hypothesis is that the 4 similar traits are describing different aspects of an "over-trait"/"main-trait".

I have used MTAG as a meta-analysis tool to get one output. I thought that by using MTAG I would get a boost of almost-significant results, maybe get an indication of new interesting loci, and an assembled look on this "over-trait"

That is not the case. Only some of my genome-wide significant loci continues to be genome-wide significant in the MTAG output. It is the loci there is consistent across the 4 trait-GWASes. So the output, somehow makes sense. No new loci appear.

I have been reading through all the issues, and found that mean chi2<1.1 might be troublesome. That is the case here. Even though during the munging and so on, it is higher than 1.1.

I have some thoughts on this, and hope that you can guide me.

1) I was wondering if my true problem is sample overlap. The GWASes are based on the same population => total sample overlap. I understand that MTAG copes with samples overlap. But I do not have an understanding of how. Is it not possible to use MTAG with total sample overlap?

2) You recommend QC prior to MTAG. I have not done genomic correction prior because LD Score Regression does not recommend that when using their tool. Should I do the genomic correction when using MTAG? Could that improve mean chi2?

3) Is it correct that I cannot perform maxFDR on these results, as an omega file is not produced, because of the --equal-h2 ? Am I actually using the tool wrong and should not use --equal-h2 to get one output?

Thanks!

Anne

LOG:

Calling ./mtag.py \ --equal-h2 \ --perfect-gencov \ --use-beta-se \ --sumstats t1.sumstats,t2.sumstats,t3.sumstats,t4.sumstats \ --out results.unified/mtagout.clusteret1

2021/01/04/09:37:32 AM Beginning MTAG analysis... 2021/01/04/09:37:32 AM MTAG will use the provided BETA/SE columns for analyses. 2021/01/04/09:38:07 AM Read in Trait 1 summary statistics (8673127 SNPs) from cir.sumstats ...

2021/01/04/09:38:07 AM Munging Trait 1
2021/01/04/09:38:07 AM Interpreting column names as follows: 2021/01/04/09:38:07 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients 2021/01/04/09:38:07 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2021/01/04/09:38:40 AM Read 8673127 SNPs from --sumstats file. Removed 566 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8672561 SNPs remain. 2021/01/04/09:38:54 AM Removed 0 SNPs with duplicated rs numbers (8672561 SNPs remain). 2021/01/04/09:38:57 AM Removed 3347280 SNPs with N < 9956.66666667 (5325281 SNPs remain). 2021/01/04/09:40:34 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2021/01/04/09:40:34 AM Dropping snps with null values 2021/01/04/09:40:36 AM Metadata: 2021/01/04/09:40:37 AM Mean chi^2 = 1.106 2021/01/04/09:40:38 AM Lambda GC = 1.091 2021/01/04/09:40:38 AM Max chi^2 = 152.664 2021/01/04/09:40:38 AM 845 Genome-wide significant SNPs (some may have been removed by filtering). 2021/01/04/09:40:38 AM Conversion finished at Mon Jan 4 09:40:38 2021 2021/01/04/09:40:38 AM Total time elapsed: 2.0m:31.03s 2021/01/04/09:41:06 AM Munging of Trait 1 complete. SNPs remaining: 5325281 2021/01/04/09:42:05 AM Read in Trait 2 summary statistics (8675438 SNPs) from stumvoll.sumstats ...

2021/01/04/09:42:05 AM Munging Trait 2
2021/01/04/09:42:05 AM Interpreting column names as follows: 2021/01/04/09:42:05 AM snpid: Variant ID (e.g., rs number) 2021/01/04/09:42:06 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2021/01/04/09:42:38 AM Read 8675438 SNPs from --sumstats file. Removed 568 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8674870 SNPs remain. 2021/01/04/09:42:52 AM Removed 0 SNPs with duplicated rs numbers (8674870 SNPs remain). 2021/01/04/09:42:54 AM Removed 3349663 SNPs with N < 9956.66666667 (5325207 SNPs remain). 2021/01/04/09:44:32 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2021/01/04/09:44:32 AM Dropping snps with null values 2021/01/04/09:44:34 AM Metadata: 2021/01/04/09:44:35 AM Mean chi^2 = 1.101 2021/01/04/09:44:35 AM Lambda GC = 1.09 2021/01/04/09:44:35 AM Max chi^2 = 115.473 2021/01/04/09:44:35 AM 560 Genome-wide significant SNPs (some may have been removed by filtering). 2021/01/04/09:44:35 AM Conversion finished at Mon Jan 4 09:44:35 2021 2021/01/04/09:44:35 AM Total time elapsed: 2.0m:30.57s 2021/01/04/09:45:08 AM Munging of Trait 2 complete. SNPs remaining: 5325207 2021/01/04/09:46:28 AM Read in Trait 3 summary statistics (8671308 SNPs) from xinsdG30.sumstats ...

2021/01/04/09:46:28 AM Munging Trait 3 2021/01/04/09:46:28 AM Interpreting column names as follows: 2021/01/04/09:46:28 AM snpid: Variant ID (e.g., rs number) 2021/01/04/09:46:28 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2021/01/04/09:47:05 AM Read 8671308 SNPs from --sumstats file. Removed 561 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8670747 SNPs remain. 2021/01/04/09:47:21 AM Removed 0 SNPs with duplicated rs numbers (8670747 SNPs remain). 2021/01/04/09:47:24 AM Removed 3345522 SNPs with N < 9956.66666667 (5325225 SNPs remain). 2021/01/04/09:49:06 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2021/01/04/09:49:06 AM Dropping snps with null values 2021/01/04/09:49:08 AM Metadata: 2021/01/04/09:49:10 AM Mean chi^2 = 1.102 2021/01/04/09:49:10 AM Lambda GC = 1.081 2021/01/04/09:49:10 AM Max chi^2 = 152.904 2021/01/04/09:49:10 AM 899 Genome-wide significant SNPs (some may have been removed by filtering). 2021/01/04/09:49:10 AM Conversion finished at Mon Jan 4 09:49:10 2021 2021/01/04/09:49:10 AM Total time elapsed: 2.0m:42.41s 2021/01/04/09:50:04 AM Munging of Trait 3 complete. SNPs remaining: 5325225 2021/01/04/09:52:18 AM Read in Trait 4 summary statistics (8671941 SNPs) from xinsG30.sumstats

2021/01/04/09:52:18 AM Munging Trait 4
2021/01/04/09:52:18 AM Interpreting column names as follows: 2021/01/04/09:52:18 AM snpid: Variant ID (e.g., rs number) 2021/01/04/09:52:19 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2021/01/04/09:54:13 AM Read 8671941 SNPs from --sumstats file. Removed 562 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8671379 SNPs remain. 2021/01/04/09:54:29 AM Removed 0 SNPs with duplicated rs numbers (8671379 SNPs remain). 2021/01/04/09:54:32 AM Removed 3346166 SNPs with N < 9956.66666667 (5325213 SNPs remain). 2021/01/04/09:56:16 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2021/01/04/09:56:16 AM Dropping snps with null values 2021/01/04/09:56:21 AM Metadata: 2021/01/04/09:56:27 AM Mean chi^2 = 1.117 2021/01/04/09:56:27 AM Lambda GC = 1.097 2021/01/04/09:56:28 AM Max chi^2 = 144.709 2021/01/04/09:56:28 AM 787 Genome-wide significant SNPs (some may have been removed by filtering). 2021/01/04/09:56:28 AM Conversion finished at Mon Jan 4 09:56:28 2021 2021/01/04/09:56:28 AM Total time elapsed: 4.0m:9.47s 2021/01/04/09:57:45 AM Munging of Trait 4 complete. SNPs remaining: 5325213 2021/01/04/09:59:08 AM Dropped 809567 SNPs due to strand ambiguity, 4515714 SNPs remain in intersection after merging trait1 2021/01/04/10:01:04 AM Dropped 0 SNPs due to strand ambiguity, 4515259 SNPs remain in intersection after merging trait2 2021/01/04/10:03:17 AM Dropped 0 SNPs due to strand ambiguity, 4514847 SNPs remain in intersection after merging trait3 2021/01/04/10:05:24 AM Dropped 0 SNPs due to strand ambiguity, 4514405 SNPs remain in intersection after merging trait4 2021/01/04/10:05:24 AM ... Merge of GWAS summary statistics complete. Number of SNPs: 4514405 2021/01/04/10:06:16 AM Using 4514405 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) 2021/01/04/10:06:16 AM Estimating sigma.. 2021/01/04/10:13:00 AM Checking for positive definiteness .. 2021/01/04/10:13:00 AM Sigma hat: [[1.029 0.923 0.951 0.865] [0.923 1.034 0.871 0.882] [0.951 0.871 1.035 0.872] [0.865 0.882 0.872 1.036]] 2021/01/04/10:13:00 AM Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation. 2021/01/04/10:13:00 AM Beginning estimation of Omega ... 2021/01/04/10:13:01 AM --perfect_gencov and --equal_h2 option used 2021/01/04/10:13:01 AM Completed estimation of Omega ... 2021/01/04/10:13:01 AM Beginning MTAG calculations... 2021/01/04/10:13:32 AM ... Completed MTAG calculations. 2021/01/04/10:13:32 AM With meta-analysis mode, MTAG produces a single set of sumstats, where betas are unstandardized using 2p(1-p) where p is the average allele frequencies across traits. 2021/01/04/10:13:33 AM Writing Meta-analysis results to file ... 2021/01/04/10:14:52 AM Summary of MTAG results:

Trait # SNPs used N (max) N (mean) GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 trait1.sumstats 4514405 14935 14922 1.075 1.071 14101
2 trait2.sumstats 4514405 14935 14922 1.065 1.071 16270
3 trait3.sumstats 4514405 14935 14922 1.065 1.071 16106
4 trait4.sumstats 4514405 14935 14922 1.078 1.071 13519
Omega hat not computed because --equal_h2 was used.

Estimated Sigma: [[1.029 0.923 0.951 0.865] [0.923 1.034 0.871 0.882] [0.951 0.871 1.035 0.872] [0.865 0.882 0.872 1.036]]

(Correlation): [[1. 0.895 0.921 0.838] [0.895 1. 0.842 0.852] [0.921 0.842 1. 0.842] [0.838 0.852 0.842 1. ]]

MTAG weight factors: (average across SNPs) [1. 1. 1. 1.]

2021/01/04/10:14:52 AM
2021/01/04/10:14:52 AM MTAG results saved to file. 2021/01/04/10:14:52 AM MTAG complete. Time elapsed: 37.0m:19.5231349468s

paturley commented 3 years ago

Hello,

Thanks for your detailed question:

1) MTAG should be just fine if your samples are overlapping. Even if they are perfectly overlapping. The interpretation of the results are the same as if there were no overlap.

2) No need to do genomic control before sending results into MTAG. MTAG does an LD score intercept correction already by default. By QC, I mostly meant things like verifying that the columns contain the right information (e.g., allele frequencies between 0 and 1, positive p-values, etc.)

3) It looks like you have assumed both equal heritability and perfect genetic correlation. I don't know that this is consistent with the scientific question you are asking since if both of these assumptions are true, then that means every SNP must have the exact same effect size for each of your phenotypes. In this case, maxFDR doesn't make sense since the FDR is only inflated due to SNPs that are associated with one phenotype but not the others (and such SNPs won't exist if every SNP has the same effect size for each phenotype).

If you make these assumptions in the software, but they don't actually hold, what you will get is something very similar to if you had taken your 4 phenotypes measured in non-overlapping samples and did a standard meta-analysis. I think that genome-wide significant loci in such a case would have the interpretation of being associated with at least one phenotype in your set, but I don't know that you can say much more than that.

Best, Patrick

On Mon, Jan 4, 2021 at 7:58 AM annelundager notifications@github.com wrote:

Dear MTAG-knowledged

I'm investigating the genetics behind 4 similar traits through 4 GWAS. The 4 traits are kind of the same, but are calculated with different mathematics. My hypothesis is that the 4 similar traits are describing different aspects of an "over-trait"/"main-trait".

I have used MTAG as a meta-analysis tool to get one output. I thought that by using MTAG I would get a boost of almost significant results, and maybe get an indication of new interesting loci, and an assembled look on this "over-trait"

That is not the case. Only some of my genome-wide significant loci continues to be genome-wide significant in the MTAG output. It is the loci there is consistent across the 4 trait-GWASes. So the output, somehow makes sense.

I have been reading through all the issues, and found that mean chi2<1.1 might be troublesome. That is the case here. Even though during the munging and so on, it is higher than 1.1.

I have some thoughts on this, and hope that you can guide me.

1.

I was wondering if my true problem is sample overlap. The GWASes are based on the same population => total sample overlap. I understand that MTAG copes with samples overlap. But I do not have an understanding of how. Is it not possible to use MTAG with total sample overlap?

1.

You recommend QC prior to MTAG. I have not done genomic correction prior because LD Score Regression does not recommend that when using their tool. Should I do the genomic correction when using MTAG? Could that improve mean chi2?

1.

Is it correct that I cannot perform maxFDR on these results, as an omega file is not produced, because of the --equal-h2 ? Am I actually using the tool wrong and should not use --equal-h2 to get one output?

Thanks!

LOG:

Calling ./mtag.py --equal-h2 --perfect-gencov --use-beta-se --sumstats t1.sumstats,t2.sumstats,t3.sumstats,t4.sumstats --out results.unified/mtagout.clusteret1

2021/01/04/09:37:32 AM Beginning MTAG analysis... 2021/01/04/09:37:32 AM MTAG will use the provided BETA/SE columns for analyses. 2021/01/04/09:38:07 AM Read in Trait 1 summary statistics (8673127 SNPs) from cir.sumstats ... 2021/01/04/09:38:07 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>< 2021/01/04/09:38:07 AM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2021/01/04/09:38:07 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>< 2021/01/04/09:38:07 AM Interpreting column names as follows: 2021/01/04/09:38:07 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients

2021/01/04/09:38:07 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2021/01/04/09:38:40 AM Read 8673127 SNPs from --sumstats file. Removed 566 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8672561 SNPs remain. 2021/01/04/09:38:54 AM Removed 0 SNPs with duplicated rs numbers (8672561 SNPs remain). 2021/01/04/09:38:57 AM Removed 3347280 SNPs with N < 9956.66666667 (5325281 SNPs remain). 2021/01/04/09:40:34 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2021/01/04/09:40:34 AM Dropping snps with null values 2021/01/04/09:40:36 AM Metadata: 2021/01/04/09:40:37 AM Mean chi^2 = 1.106 2021/01/04/09:40:38 AM Lambda GC = 1.091 2021/01/04/09:40:38 AM Max chi^2 = 152.664 2021/01/04/09:40:38 AM 845 Genome-wide significant SNPs (some may have been removed by filtering). 2021/01/04/09:40:38 AM Conversion finished at Mon Jan 4 09:40:38 2021 2021/01/04/09:40:38 AM Total time elapsed: 2.0m:31.03s 2021/01/04/09:41:06 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>< 2021/01/04/09:41:06 AM Munging of Trait 1 complete. SNPs remaining: 5325281 2021/01/04/09:41:06 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>< 2021/01/04/09:42:05 AM Read in Trait 2 summary statistics (8675438 SNPs) from stumvoll.sumstats ... 2021/01/04/09:42:05 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2021/01/04/09:42:05 AM Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2021/01/04/09:42:05 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>< 2021/01/04/09:42:05 AM Interpreting column names as follows: 2021/01/04/09:42:05 AM snpid: Variant ID (e.g., rs number)

2021/01/04/09:42:06 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2021/01/04/09:42:38 AM Read 8675438 SNPs from --sumstats file. Removed 568 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8674870 SNPs remain. 2021/01/04/09:42:52 AM Removed 0 SNPs with duplicated rs numbers (8674870 SNPs remain). 2021/01/04/09:42:54 AM Removed 3349663 SNPs with N < 9956.66666667 (5325207 SNPs remain). 2021/01/04/09:44:32 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2021/01/04/09:44:32 AM Dropping snps with null values 2021/01/04/09:44:34 AM Metadata: 2021/01/04/09:44:35 AM Mean chi^2 = 1.101 2021/01/04/09:44:35 AM Lambda GC = 1.09 2021/01/04/09:44:35 AM Max chi^2 = 115.473 2021/01/04/09:44:35 AM 560 Genome-wide significant SNPs (some may have been removed by filtering). 2021/01/04/09:44:35 AM Conversion finished at Mon Jan 4 09:44:35 2021 2021/01/04/09:44:35 AM Total time elapsed: 2.0m:30.57s 2021/01/04/09:45:08 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2021/01/04/09:45:08 AM Munging of Trait 2 complete. SNPs remaining: 5325207 2021/01/04/09:45:08 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2021/01/04/09:46:28 AM Read in Trait 3 summary statistics (8671308 SNPs) from xinsdG30.sumstats ... 2021/01/04/09:46:28 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2021/01/04/09:46:28 AM Munging Trait 3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2021/01/04/09:46:28 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2021/01/04/09:46:28 AM Interpreting column names as follows: 2021/01/04/09:46:28 AM snpid: Variant ID (e.g., rs number)

2021/01/04/09:46:28 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2021/01/04/09:47:05 AM Read 8671308 SNPs from --sumstats file. Removed 561 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8670747 SNPs remain. 2021/01/04/09:47:21 AM Removed 0 SNPs with duplicated rs numbers (8670747 SNPs remain). 2021/01/04/09:47:24 AM Removed 3345522 SNPs with N < 9956.66666667 (5325225 SNPs remain). 2021/01/04/09:49:06 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2021/01/04/09:49:06 AM Dropping snps with null values 2021/01/04/09:49:08 AM Metadata: 2021/01/04/09:49:10 AM Mean chi^2 = 1.102 2021/01/04/09:49:10 AM Lambda GC = 1.081 2021/01/04/09:49:10 AM Max chi^2 = 152.904 2021/01/04/09:49:10 AM 899 Genome-wide significant SNPs (some may have been removed by filtering). 2021/01/04/09:49:10 AM Conversion finished at Mon Jan 4 09:49:10 2021 2021/01/04/09:49:10 AM Total time elapsed: 2.0m:42.41s 2021/01/04/09:50:04 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2021/01/04/09:50:04 AM Munging of Trait 3 complete. SNPs remaining: 5325225 2021/01/04/09:50:04 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2021/01/04/09:52:18 AM Read in Trait 4 summary statistics (8671941 SNPs) from xinsG30.sumstats ... 2021/01/04/09:52:18 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>< 2021/01/04/09:52:18 AM Munging Trait 4 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2021/01/04/09:52:18 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>< 2021/01/04/09:52:18 AM Interpreting column names as follows: 2021/01/04/09:52:18 AM snpid: Variant ID (e.g., rs number) 2021/01/04/09:52:19 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2021/01/04/09:54:13 AM Read 8671941 SNPs from --sumstats file. Removed 562 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8671379 SNPs remain. 2021/01/04/09:54:29 AM Removed 0 SNPs with duplicated rs numbers (8671379 SNPs remain). 2021/01/04/09:54:32 AM Removed 3346166 SNPs with N < 9956.66666667 (5325213 SNPs remain). 2021/01/04/09:56:16 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2021/01/04/09:56:16 AM Dropping snps with null values 2021/01/04/09:56:21 AM Metadata: 2021/01/04/09:56:27 AM Mean chi^2 = 1.117 2021/01/04/09:56:27 AM Lambda GC = 1.097 2021/01/04/09:56:28 AM Max chi^2 = 144.709 2021/01/04/09:56:28 AM 787 Genome-wide significant SNPs (some may have been removed by filtering). 2021/01/04/09:56:28 AM Conversion finished at Mon Jan 4 09:56:28 2021 2021/01/04/09:56:28 AM Total time elapsed: 4.0m:9.47s 2021/01/04/09:57:45 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>< 2021/01/04/09:57:45 AM Munging of Trait 4 complete. SNPs remaining: 5325213 2021/01/04/09:57:45 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>< 2021/01/04/09:59:08 AM Dropped 809567 SNPs due to strand ambiguity, 4515714 SNPs remain in intersection after merging trait1 2021/01/04/10:01:04 AM Dropped 0 SNPs due to strand ambiguity, 4515259 SNPs remain in intersection after merging trait2 2021/01/04/10:03:17 AM Dropped 0 SNPs due to strand ambiguity, 4514847 SNPs remain in intersection after merging trait3 2021/01/04/10:05:24 AM Dropped 0 SNPs due to strand ambiguity, 4514405 SNPs remain in intersection after merging trait4 2021/01/04/10:05:24 AM ... Merge of GWAS summary statistics complete. Number of SNPs: 4514405 2021/01/04/10:06:16 AM Using 4514405 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) 2021/01/04/10:06:16 AM Estimating sigma.. 2021/01/04/10:13:00 AM Checking for positive definiteness .. 2021/01/04/10:13:00 AM Sigma hat: [[1.029 0.923 0.951 0.865] [0.923 1.034 0.871 0.882] [0.951 0.871 1.035 0.872] [0.865 0.882 0.872 1.036]] 2021/01/04/10:13:00 AM Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation. 2021/01/04/10:13:00 AM Beginning estimation of Omega ... 2021/01/04/10:13:01 AM --perfect_gencov and --equal_h2 option used 2021/01/04/10:13:01 AM Completed estimation of Omega ... 2021/01/04/10:13:01 AM Beginning MTAG calculations... 2021/01/04/10:13:32 AM ... Completed MTAG calculations. 2021/01/04/10:13:32 AM With meta-analysis mode, MTAG produces a single set of sumstats, where betas are unstandardized using 2p(1-p) where p is the average allele frequencies across traits. 2021/01/04/10:13:33 AM Writing Meta-analysis results to file ... 2021/01/04/10:14:52 AM Summary of MTAG results:

Trait # SNPs used N (max) N (mean) GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 cir.sumstats 4514405 14935 14922 1.075 1.071 14101 2 stumvoll.sumstats 4514405 14935 14922 1.065 1.071 16270 3 xinsdG30.sumstats 4514405 14935 14922 1.065 1.071 16106 4 xinsG30.sumstats 4514405 14935 14922 1.078 1.071 13519 Omega hat not computed because --equal_h2 was used.

Estimated Sigma: [[1.029 0.923 0.951 0.865] [0.923 1.034 0.871 0.882] [0.951 0.871 1.035 0.872] [0.865 0.882 0.872 1.036]]

(Correlation): [[1. 0.895 0.921 0.838] [0.895 1. 0.842 0.852] [0.921 0.842 1. 0.842] [0.838 0.852 0.842 1. ]]

MTAG weight factors: (average across SNPs) [1. 1. 1. 1.]

2021/01/04/10:14:52 AM 2021/01/04/10:14:52 AM MTAG results saved to file. 2021/01/04/10:14:52 AM MTAG complete. Time elapsed: 37.0m:19.5231349468s

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/123, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5KWOZTCOJ5FMBYLZP3SYG3O5ANCNFSM4VS7L3AA .

annelundager commented 3 years ago

Hi Patrick,

Thanks for a similar detailed answer. Many things are much more clear now regarding MTAG as meta-analysis tool and maxFDR. I think that I must change my setup not to be meta-analysis.

New questions arises:

1) My meanchi2 i still a bit too low. Can I trust my results? If not, can I do anything to improve the meanchi2? (Log is at the end)

2) I'm trying to run the maxFDR without having to run the MTAG again. I'm not sure from your tutorial how I should define the path to the mtag-file. Right now I'm defining the path to the mtag-result-files through --out. It results in an error:

_python ~/mtag/mtag.py --fdr --skip_mtag --out results/mtagout.clusteret1

Traceback (most recent call last): File "/home/hvl544/mtag/mtag.py", line 1563, in fdr(args, N_mat, Z_mat) NameError: name 'Nmat' is not defined

Thanks again!

Anne

LOG (I have added a trait 5) Calling ./mtag.py \ --perfect-gencov \ --use-beta-se \ --sumstats xinsG30.sumstats,bigair.sumstats,cir.sumstats,stumvoll.sumstats,xinsdG30.sumstats \ --out results/mtagout.clusteret

2020/12/21/10:39:37 AM Beginning MTAG analysis... 2020/12/21/10:39:37 AM MTAG will use the provided BETA/SE columns for analyses.

2020/12/21/10:40:10 AM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/12/21/10:40:10 AM Interpreting column names as follows: 2020/12/21/10:40:10 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients 2020/12/21/10:40:40 AM Read 8671941 SNPs from --sumstats file. Removed 562 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8671379 SNPs remain. 2020/12/21/10:40:53 AM Removed 0 SNPs with duplicated rs numbers (8671379 SNPs remain). 2020/12/21/10:40:55 AM Removed 3346166 SNPs with N < 9956.66666667 (5325213 SNPs remain). 2020/12/21/10:42:33 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2020/12/21/10:42:33 AM Dropping snps with null values Metadata: 2020/12/21/10:42:36 AM Mean chi^2 = 1.117 2020/12/21/10:42:36 AM Lambda GC = 1.097 2020/12/21/10:42:36 AM Max chi^2 = 144.709 2020/12/21/10:42:36 AM 787 Genome-wide significant SNPs (some may have been removed by filtering).

2020/12/21/10:44:02 AM Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><> 2020/12/21/10:44:02 AM Interpreting column names as follows: 2020/12/21/10:44:02 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients 2020/12/21/10:44:29 AM Read 8855363 SNPs from --sumstats file. Removed 639 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8854724 SNPs remain. 2020/12/21/10:44:40 AM Removed 0 SNPs with duplicated rs numbers (8854724 SNPs remain). 2020/12/21/10:44:41 AM Removed 3487545 SNPs with N < 9956.66666667 (5367179 SNPs remain). 2020/12/21/10:46:19 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2020/12/21/10:46:20 AM Dropping snps with null values Metadata: 2020/12/21/10:46:21 AM Mean chi^2 = 1.086 2020/12/21/10:46:21 AM Lambda GC = 1.071 2020/12/21/10:46:22 AM Max chi^2 = 243.51 2020/12/21/10:46:22 AM 647 Genome-wide significant SNPs (some may have been removed by filtering).

2020/12/21/10:47:41 AM Munging Trait 3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>2020/12/21/10:47:41 AM Interpreting column names as follows: 2020/12/21/10:47:41 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients 2020/12/21/10:48:27 AM Read 8673127 SNPs from --sumstats file. Removed 566 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8672561 SNPs remain. 2020/12/21/10:48:41 AM Removed 0 SNPs with duplicated rs numbers (8672561 SNPs remain). 2020/12/21/10:48:43 AM Removed 3347280 SNPs with N < 9956.66666667 (5325281 SNPs remain). 2020/12/21/10:50:21 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2020/12/21/10:50:21 AM Dropping snps with null values Metadata: 2020/12/21/10:50:23 AM Mean chi^2 = 1.106 2020/12/21/10:50:23 AM Lambda GC = 1.091 2020/12/21/10:50:23 AM Max chi^2 = 152.664 2020/12/21/10:50:24 AM 845 Genome-wide significant SNPs (some may have been removed by filtering).

2020/12/21/10:51:59 AM Munging Trait 4 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/12/21/10:51:59 AM Interpreting column names as follows: 2020/12/21/10:51:59 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients 2020/12/21/10:52:28 AM Read 8675438 SNPs from --sumstats file. Removed 568 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8674870 SNPs remain. 2020/12/21/10:52:41 AM Removed 0 SNPs with duplicated rs numbers (8674870 SNPs remain). 2020/12/21/10:52:43 AM Removed 3349663 SNPs with N < 9956.66666667 (5325207 SNPs remain). 2020/12/21/10:54:21 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2020/12/21/10:54:21 AM Dropping snps with null values Metadata: 2020/12/21/10:54:23 AM Mean chi^2 = 1.101 2020/12/21/10:54:23 AM Lambda GC = 1.09 2020/12/21/10:54:23 AM Max chi^2 = 115.473 2020/12/21/10:54:24 AM 560 Genome-wide significant SNPs (some may have been removed by filtering).

2020/12/21/10:55:51 AM Munging Trait 5 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/12/21/10:55:51 AM Interpreting column names as follows: 2020/12/21/10:55:51 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients 2020/12/21/10:56:23 AM Read 8671308 SNPs from --sumstats file. Removed 561 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8670747 SNPs remain. 2020/12/21/10:56:36 AM Removed 0 SNPs with duplicated rs numbers (8670747 SNPs remain). 2020/12/21/10:56:39 AM Removed 3345522 SNPs with N < 9956.66666667 (5325225 SNPs remain). 2020/12/21/10:58:16 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2020/12/21/10:58:16 AM Dropping snps with null values Metadata: 2020/12/21/10:58:18 AM Mean chi^2 = 1.102 2020/12/21/10:58:18 AM Lambda GC = 1.081 2020/12/21/10:58:18 AM Max chi^2 = 152.904 2020/12/21/10:58:18 AM 899 Genome-wide significant SNPs (some may have been removed by filtering).

2020/12/21/10:58:44 AM Munging of Trait 5 complete. SNPs remaining: 5325225 2020/12/21/10:59:20 AM Dropped 809549 SNPs due to strand ambiguity, 4515664 SNPs remain in intersection after merging trait1 2020/12/21/10:59:56 AM Dropped 0 SNPs due to strand ambiguity, 4512524 SNPs remain in intersection after merging trait2 2020/12/21/11:00:39 AM Dropped 0 SNPs due to strand ambiguity, 4512051 SNPs remain in intersection after merging trait3 2020/12/21/11:01:27 AM Dropped 0 SNPs due to strand ambiguity, 4511672 SNPs remain in intersection after merging trait4 2020/12/21/11:02:26 AM Dropped 0 SNPs due to strand ambiguity, 4511308 SNPs remain in intersection after merging trait5

2020/12/21/11:02:26 AM ... Merge of GWAS summary statistics complete. Number of SNPs: 4511308 2020/12/21/11:03:25 AM Using 4511308 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) 2020/12/21/11:03:25 AM Estimating sigma.. 2020/12/21/11:11:34 AM Checking for positive definiteness .. 2020/12/21/11:11:34 AM Sigma hat: [[1.036 0.743 0.865 0.882 0.872] [0.743 1.027 0.783 0.803 0.69 ] [0.865 0.783 1.029 0.923 0.951] [0.882 0.803 0.923 1.034 0.871] [0.872 0.69 0.951 0.871 1.035]] 2020/12/21/11:11:34 AM Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation. 2020/12/21/11:11:35 AM Beginning estimation of Omega ... 2020/12/21/11:11:35 AM Using GMM estimator of Omega .. 2020/12/21/11:11:38 AM Checking for positive definiteness .. 2020/12/21/11:11:38 AM Completed estimation of Omega ... 2020/12/21/11:11:38 AM Beginning MTAG calculations... 2020/12/21/11:12:28 AM ... Completed MTAG calculations.

2020/12/21/11:19:45 AM Summary of MTAG results:

Trait # SNPs used N (max) N (mean) GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 trait1.sumstats 4511308 14935 14923 1.078 1.08 15322
2 trait2.sumstats 4511308 14935 14923 1.057 1.08 20971
3 trait3.sumstats 4511308 14935 14923 1.075 1.08 15991
4 trait4.sumstats 4511308 14935 14923 1.065 1.08 18459
5 trait5.sumstats 4511308 14935 14923 1.065 1.08 18248

Estimated Omega: [[4.711e-06 3.944e-06 4.638e-06 3.976e-06 4.448e-06] [3.944e-06 3.302e-06 3.883e-06 3.329e-06 3.724e-06] [4.638e-06 3.883e-06 4.566e-06 3.915e-06 4.379e-06] [3.976e-06 3.329e-06 3.915e-06 3.356e-06 3.754e-06] [4.448e-06 3.724e-06 4.379e-06 3.754e-06 4.199e-06]]

(Correlation): [[1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.]]

Estimated Sigma: [[1.036 0.743 0.865 0.882 0.872] [0.743 1.027 0.783 0.803 0.69 ] [0.865 0.783 1.029 0.923 0.951] [0.882 0.803 0.923 1.034 0.871] [0.872 0.69 0.951 0.871 1.035]]

(Correlation): [[1. 0.72 0.838 0.852 0.842] [0.72 1. 0.761 0.779 0.669] [0.838 0.761 1. 0.895 0.921] [0.852 0.779 0.895 1. 0.842] [0.842 0.669 0.921 0.842 1. ]]

MTAG weight factors: (average across SNPs) [0.981 0.821 0.966 0.828 0.926]

2020/12/21/11:19:45 AM
2020/12/21/11:19:45 AM MTAG results saved to file. 2020/12/21/11:19:45 AM MTAG complete. Time elapsed: 40.0m:8.06567716599s

paturley commented 3 years ago

Hi Anne,

  1. The mean chi2 is a function of the sample size and the heritability of your phenotype. So I don't know that you can increase the mean chi2 without gathering more data.
  2. I think that you still can't run maxFDR if you have used the perfect genetic correlation flag, which it looks like you have. A pair of traits will only have a perfect genetic correlation with each other if the effect of each SNP on one trait is a constant multiple of the effect of that SNP on the other trait. So you can't have situations where the effect is zero for one trait and non-zero for another. Does that make sense?

best, Patrick

On Tue, Jan 5, 2021 at 6:25 AM annelundager notifications@github.com wrote:

Hi Patrick,

Thanks for a similar detailed answer. Many things are much more clear now regarding MTAG as meta-analysis tool and maxFDR. I think that I much change to my setup to not be meta-analysis.

New questions arises:

1.

My meanchi2 i still a bit too low. Can I trust my results? If not, can I do anything to improve the meanchi2? (Log is at the end) 2.

I'm trying to run the maxFDR without having to run the MTAG again. I'm not sure from your tutorial how I should define the path to the mtag-file. Right now I'm giving the path to the mtag-result-files through --out. See below

_python ~/mtag/mtag.py --fdr --skip_mtag --out results/mtagout.clusteret1

Traceback (most recent call last): File "/home/hvl544/mtag/mtag.py", line 1563, in fdr(args, N_mat, Z_mat) NameError: name 'Nmat' is not defined

Thanks again!

Anne

LOG (I have added a trait 5) Calling ./mtag.py --perfect-gencov --use-beta-se --sumstats xinsG30.sumstats,bigair.sumstats,cir.sumstats,stumvoll.sumstats,xinsdG30.sumstats

--out results/mtagout.clusteret

2020/12/21/10:39:37 AM Beginning MTAG analysis... 2020/12/21/10:39:37 AM MTAG will use the provided BETA/SE columns for analyses.

2020/12/21/10:40:10 AM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/12/21/10:40:10 AM Interpreting column names as follows: 2020/12/21/10:40:10 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients 2020/12/21/10:40:40 AM Read 8671941 SNPs from --sumstats file. Removed 562 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8671379 SNPs remain. 2020/12/21/10:40:53 AM Removed 0 SNPs with duplicated rs numbers (8671379 SNPs remain). 2020/12/21/10:40:55 AM Removed 3346166 SNPs with N < 9956.66666667 (5325213 SNPs remain). 2020/12/21/10:42:33 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2020/12/21/10:42:33 AM Dropping snps with null values Metadata: 2020/12/21/10:42:36 AM Mean chi^2 = 1.117 2020/12/21/10:42:36 AM Lambda GC = 1.097 2020/12/21/10:42:36 AM Max chi^2 = 144.709 2020/12/21/10:42:36 AM 787 Genome-wide significant SNPs (some may have been removed by filtering).

2020/12/21/10:44:02 AM Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><> 2020/12/21/10:44:02 AM Interpreting column names as follows: 2020/12/21/10:44:02 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients 2020/12/21/10:44:29 AM Read 8855363 SNPs from --sumstats file. Removed 639 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8854724 SNPs remain. 2020/12/21/10:44:40 AM Removed 0 SNPs with duplicated rs numbers (8854724 SNPs remain). 2020/12/21/10:44:41 AM Removed 3487545 SNPs with N < 9956.66666667 (5367179 SNPs remain). 2020/12/21/10:46:19 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2020/12/21/10:46:20 AM Dropping snps with null values Metadata: 2020/12/21/10:46:21 AM Mean chi^2 = 1.086 2020/12/21/10:46:21 AM Lambda GC = 1.071 2020/12/21/10:46:22 AM Max chi^2 = 243.51 2020/12/21/10:46:22 AM 647 Genome-wide significant SNPs (some may have been removed by filtering).

2020/12/21/10:47:41 AM Munging Trait 3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>2020/12/21/10:47:41 AM Interpreting column names as follows: 2020/12/21/10:47:41 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients 2020/12/21/10:48:27 AM Read 8673127 SNPs from --sumstats file. Removed 566 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8672561 SNPs remain. 2020/12/21/10:48:41 AM Removed 0 SNPs with duplicated rs numbers (8672561 SNPs remain). 2020/12/21/10:48:43 AM Removed 3347280 SNPs with N < 9956.66666667 (5325281 SNPs remain). 2020/12/21/10:50:21 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2020/12/21/10:50:21 AM Dropping snps with null values Metadata: 2020/12/21/10:50:23 AM Mean chi^2 = 1.106 2020/12/21/10:50:23 AM Lambda GC = 1.091 2020/12/21/10:50:23 AM Max chi^2 = 152.664 2020/12/21/10:50:24 AM 845 Genome-wide significant SNPs (some may have been removed by filtering).

2020/12/21/10:51:59 AM Munging Trait 4 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/12/21/10:51:59 AM Interpreting column names as follows: 2020/12/21/10:51:59 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients 2020/12/21/10:52:28 AM Read 8675438 SNPs from --sumstats file. Removed 568 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8674870 SNPs remain. 2020/12/21/10:52:41 AM Removed 0 SNPs with duplicated rs numbers (8674870 SNPs remain). 2020/12/21/10:52:43 AM Removed 3349663 SNPs with N < 9956.66666667 (5325207 SNPs remain). 2020/12/21/10:54:21 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2020/12/21/10:54:21 AM Dropping snps with null values Metadata: 2020/12/21/10:54:23 AM Mean chi^2 = 1.101 2020/12/21/10:54:23 AM Lambda GC = 1.09 2020/12/21/10:54:23 AM Max chi^2 = 115.473 2020/12/21/10:54:24 AM 560 Genome-wide significant SNPs (some may have been removed by filtering).

2020/12/21/10:55:51 AM Munging Trait 5 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/12/21/10:55:51 AM Interpreting column names as follows: 2020/12/21/10:55:51 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value beta: Directional summary statistic as specified by --signed-sumstats. a2: a2, interpreted as non-ref allele for signed sumstat. se: Standard errors of BETA coefficients 2020/12/21/10:56:23 AM Read 8671308 SNPs from --sumstats file. Removed 561 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 8670747 SNPs remain. 2020/12/21/10:56:36 AM Removed 0 SNPs with duplicated rs numbers (8670747 SNPs remain). 2020/12/21/10:56:39 AM Removed 3345522 SNPs with N < 9956.66666667 (5325225 SNPs remain). 2020/12/21/10:58:16 AM Median value of SIGNED_SUMSTAT was 0.0, which seems sensible. 2020/12/21/10:58:16 AM Dropping snps with null values Metadata: 2020/12/21/10:58:18 AM Mean chi^2 = 1.102 2020/12/21/10:58:18 AM Lambda GC = 1.081 2020/12/21/10:58:18 AM Max chi^2 = 152.904 2020/12/21/10:58:18 AM 899 Genome-wide significant SNPs (some may have been removed by filtering).

2020/12/21/10:58:44 AM Munging of Trait 5 complete. SNPs remaining: 5325225 2020/12/21/10:59:20 AM Dropped 809549 SNPs due to strand ambiguity, 4515664 SNPs remain in intersection after merging trait1 2020/12/21/10:59:56 AM Dropped 0 SNPs due to strand ambiguity, 4512524 SNPs remain in intersection after merging trait2 2020/12/21/11:00:39 AM Dropped 0 SNPs due to strand ambiguity, 4512051 SNPs remain in intersection after merging trait3 2020/12/21/11:01:27 AM Dropped 0 SNPs due to strand ambiguity, 4511672 SNPs remain in intersection after merging trait4 2020/12/21/11:02:26 AM Dropped 0 SNPs due to strand ambiguity, 4511308 SNPs remain in intersection after merging trait5

2020/12/21/11:02:26 AM ... Merge of GWAS summary statistics complete. Number of SNPs: 4511308 2020/12/21/11:03:25 AM Using 4511308 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) 2020/12/21/11:03:25 AM Estimating sigma.. 2020/12/21/11:11:34 AM Checking for positive definiteness .. 2020/12/21/11:11:34 AM Sigma hat: [[1.036 0.743 0.865 0.882 0.872] [0.743 1.027 0.783 0.803 0.69 ] [0.865 0.783 1.029 0.923 0.951] [0.882 0.803 0.923 1.034 0.871] [0.872 0.69 0.951 0.871 1.035]] 2020/12/21/11:11:34 AM Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation. 2020/12/21/11:11:35 AM Beginning estimation of Omega ... 2020/12/21/11:11:35 AM Using GMM estimator of Omega .. 2020/12/21/11:11:38 AM Checking for positive definiteness .. 2020/12/21/11:11:38 AM Completed estimation of Omega ... 2020/12/21/11:11:38 AM Beginning MTAG calculations... 2020/12/21/11:12:28 AM ... Completed MTAG calculations. 2020/12/21/11:19:45 AM Summary of MTAG results:

Trait # SNPs used N (max) N (mean) GWAS mean chi^2 MTAG mean chi^2 GWAS equiv. (max) N 1 trait1.sumstats 4511308 14935 14923 1.078 1.08 15322 2 trait2.sumstats 4511308 14935 14923 1.057 1.08 20971 3 trait3.sumstats 4511308 14935 14923 1.075 1.08 15991 4 trait4.sumstats 4511308 14935 14923 1.065 1.08 18459 5 trait5.sumstats 4511308 14935 14923 1.065 1.08 18248

Estimated Omega: [[4.711e-06 3.944e-06 4.638e-06 3.976e-06 4.448e-06] [3.944e-06 3.302e-06 3.883e-06 3.329e-06 3.724e-06] [4.638e-06 3.883e-06 4.566e-06 3.915e-06 4.379e-06] [3.976e-06 3.329e-06 3.915e-06 3.356e-06 3.754e-06] [4.448e-06 3.724e-06 4.379e-06 3.754e-06 4.199e-06]]

(Correlation): [[1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.]]

Estimated Sigma: [[1.036 0.743 0.865 0.882 0.872] [0.743 1.027 0.783 0.803 0.69 ] [0.865 0.783 1.029 0.923 0.951] [0.882 0.803 0.923 1.034 0.871] [0.872 0.69 0.951 0.871 1.035]]

(Correlation): [[1. 0.72 0.838 0.852 0.842] [0.72 1. 0.761 0.779 0.669] [0.838 0.761 1. 0.895 0.921] [0.852 0.779 0.895 1. 0.842] [0.842 0.669 0.921 0.842 1. ]]

MTAG weight factors: (average across SNPs) [0.981 0.821 0.966 0.828 0.926]

2020/12/21/11:19:45 AM 2020/12/21/11:19:45 AM MTAG results saved to file. 2020/12/21/11:19:45 AM MTAG complete. Time elapsed: 40.0m:8.06567716599s

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/123#issuecomment-754578012, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5P3DHJ6ZYQWF75RVD3SYLZL5ANCNFSM4VS7L3AA .

annelundager commented 3 years ago

Hi Patrick,

  1. That makes sense. Now speaking of numbers.. My original model was a linear mixed model with 25.000 individuals. It was a mixed model so we could correct for relatedness. Prior to MTAG, I have investigated the genetic correlations between the phenotypes using LD Score regression. According to LD Score regression's wiki (https://github.com/bulik/ldsc/wiki) a mixed model is fine for genetic correlation but not heritability. However, when looking at the correlations and the logs the mixed model did not give reliable results. I think that it made sense because the genetic correlation is also based on heritability, and reading ldsc issues there are cases that indicates the same - the genetic correlation is also influenced by the mixed model. Therefore, we decided to use a linear model and exclude related individuals, now 14000 individuals. LD Score regression-based genetic correlation were now more reliable. I have tried writing an issue to LDSC but without luck. So, I was just thinking if you had any knowledge on this? Do you not recommend linear mixed model for mtag?

  2. That definitely make sense. After your guidance I have realised that the perfect correlation flag does not fit with my hypothesis.

Thank you very much for your amazing help.

Kind regards, Anne

annelundager commented 3 years ago

Number 2 continued:

I was focused on the perfect correlation and equal heritability flag because I wanted to assemble my correlated traits in one GWAS. It does not make sense to choose one of my four traits for presentation of MTAG results, because they equally represent the before-mentioned "over-trait" - they are different aspects of the "over-trait". Would you suggest that I present all 4 traits for MTAG or do you have any smart features to assemble MTAG results for 4 traits?

paturley commented 3 years ago
  1. I think MTAG should be fine if you use mixed models though to be honest I haven't tested it. The key assumption of LDSC that makes h2 estimates wrong if you use mixed models doesn't apply to MTAG.

  2. It really just depends on your research question. If you want results that are roughly equivalent to a meta-analysis that accounts for sample overlap, then using the perfect genetic correlation and equal h2 flags could make sense. If you want summary statistics for each trait separately that are roughly equivalent to just running a GWAS in a larger sample for each of them, then you wouldn't want to use those flags.

On Wed, Jan 6, 2021 at 3:36 PM annelundager notifications@github.com wrote:

Note to comment 2:

I was focused on the perfect correlation and equal heritability flag because I wanted to assemble my correlated traits in one GWAS. It does not make sense to choose one of my four traits for presentation of MTAG results, because they equally represent the before-mentioned "over-trait" - they are different aspects of the "over-trait". Would you suggest that I present all 4 traits for MTAG or do you have any smart features to assemble MTAG results for 4 traits?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/123#issuecomment-755654270, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5MY262TNRKPPZLTJN3SYTCTRANCNFSM4VS7L3AA .

annelundager commented 3 years ago

Thank you very much for your time and all your nice answers. I will try out the mixed models.

ATB Anne