JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
167 stars 54 forks source link

Clarifications on MTAG Results Interpretation and Methodology for Multiple Traits Analysis #197

Closed nvice111 closed 2 months ago

nvice111 commented 8 months ago

Hello,

I have some questions regarding MTAG results and its operation:

1.For example, inputting two traits, such as trait1 being diabetes (100 significant SNPs) and trait2 being MDD (Major Depressive Disorder, 100 significant SNPs). These traits seem unrelated initially, but LDSC results show a strong association. Can MTAG output two new GWAS results, where GWAS1 is an updated result for diabetes (120 significant SNPs) and GWAS2 for MDD (120 significant SNPs)? Are these GWAS results considered meta-analysis outcomes? Do they have a sequence or precedence, or should they be understood as still representing two separate traits of diabetes and MDD?

2.Can the newly identified loci in GWAS1 or GWAS2 (assuming 20 new = 120-100) be understood as new loci shared between MDD and diabetes (as many published papers have interpreted)? How should these additional loci be interpreted? I've seen many responses but still don't quite understand.

  1. Can the multiple traits input (GWAS1, GWAS2, GWAS3, etc.) all come from a single consortium, like all GWAS from UKB with almost 100% sample overlap, and the sample size is approximately 400,000? (This has been done in many published papers.) The number of input traits could be from 2 to several.

  2. For multiple trait meta-analysis (GWAS1, GWAS2, GWAS3, etc.), if GWAS1 is the primary target for meta-analysis and sample size gain, and it comes from one consortium, while all other GWAS (2, 3, ...) come from UKB with similar sample sizes of about 400,000, is this approach acceptable?

5.Is it necessary to reverse the Z values in the GWAS files inputted into MTAG? For instance, if trait1 is longevity (>90th percentile), the higher the beta, the longer the lifespan, and trait2 is epigenetic aging acceleration, where a higher beta means faster aging. If the original rg is negative, should one trait's beta be reversed to make rg positive in the meta-analysis?

I would appreciate it if you could answer each question individually. Thank you! Chen Lou

paturley commented 8 months ago

Hello Chen,

A few responses:

1) In the MTAG paper, we describe each set of summary statistics as corresponding to their unique trait. In practice, MTAG will be better powered to identify variants that are common between the set of traits jointly analyzed, and if there is a large power difference between the two GWAS sumstats, there is a risk that some SNPs will have a significant association for the weaker powered trait when it's entirely driven by the highly powered trait. The discussion of maxFDR in the MTAG paper digs into this more deeply.

2) I think that my response to one responds to this as well.

3) Yes. As described in the paper, MTAG should be robust to even 100% overlap.

4) Yes. It shouldn't matter whether the data come from the same or different cohorts.

5) No. MTAG is able to account for negative genetic correlations between the different traits. The example case in the MTAG paper is subjective well-being, depressive symptoms, and neuroticism, where SWB has a negative genetic correlation with DS and NEUR.

Best, Patrick

On Sat, Dec 9, 2023 at 12:11 PM Louchen @.***> wrote:

Hello,

I have some questions regarding MTAG results and its operation:

1.For example, inputting two traits, such as trait1 being diabetes (100 significant SNPs) and trait2 being MDD (Major Depressive Disorder, 100 significant SNPs). These traits seem unrelated initially, but LDSC results show a strong association. Can MTAG output two new GWAS results, where GWAS1 is an updated result for diabetes (120 significant SNPs) and GWAS2 for MDD (120 significant SNPs)? Are these GWAS results considered meta-analysis outcomes? Do they have a sequence or precedence, or should they be understood as still representing two separate traits of diabetes and MDD?

2.Can the newly identified loci in GWAS1 or GWAS2 (assuming 20 new = 120-100) be understood as new loci shared between MDD and diabetes (as many published papers have interpreted)? How should these additional loci be interpreted? I've seen many responses but still don't quite understand.

1.

Can the multiple traits input (GWAS1, GWAS2, GWAS3, etc.) all come from a single consortium, like all GWAS from UKB with almost 100% sample overlap, and the sample size is approximately 400,000? (This has been done in many published papers.) The number of input traits could be from 2 to several. 2.

For multiple trait meta-analysis (GWAS1, GWAS2, GWAS3, etc.), if GWAS1 is the primary target for meta-analysis and sample size gain, and it comes from one consortium, while all other GWAS (2, 3, ...) come from UKB with similar sample sizes of about 400,000, is this approach acceptable?

5.Is it necessary to reverse the Z values in the GWAS files inputted into MTAG? For instance, if trait1 is longevity (>90th percentile), the higher the beta, the longer the lifespan, and trait2 is epigenetic aging acceleration, where a higher beta means faster aging. If the original rg is negative, should one trait's beta be reversed to make rg positive in the meta-analysis?

I would appreciate it if you could answer each question individually. Thank you! Chen Lou

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/197, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5PILJGPFU7XGDH7LYLYISLULAVCNFSM6AAAAABAN5NAK2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTGOJUHA4TANA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

nvice111 commented 8 months ago

@paturley Hello Patrick, I still have some doubts about your first and second responses.

In the hypothetical scenario I described, MTAG outputs two new GWAS results, where GWAS1new is an updated result for diabetes with 120 significant SNPs and GWAS2new for MDD (Major Depressive Disorder) also with 120 significant SNPs. There are 20 new SNPs identified in either GWAS1 or GWAS2.

1.Will the order in which GWAS1 and GWAS2 are input into MTAG affect the output results (I know the output order will change)?

2.Regarding the newly discovered SNPs (let's assume 20) in GWAS1new, I am still unclear about how to understand the implications of these new SNPs. In the Nature Communications paper by Yang Y et al., "Investigating the shared genetic architecture between multiple sclerosis and inflammatory bowel diseases" (Nat Commun. 2021 Sep 24;12(1):5641. doi: 10.1038/s41467-021-25768-0. PMID: 34561436), for GWAS1new, these newly discovered SNPs (assuming there are 20), after removing SNPs that overlap with genome-wide significant SNPs in the respective single trait GWAS, and ensuring these newly discovered SNPs were independent (i.e., LD r2 < 0.05 within 1,000-kb windows) from genome-wide significant SNPs in the respective single trait, the remaining newly discovered SNPs (let's assume 2) are identified as new cross-trait SNPs. Do you think it is appropriate to understand the newly discovered SNPs in MTAG in this way?

3.Additionally, the paper (Maina JG et al., Bidirectional Mendelian Randomization and Multiphenotype GWAS Show Causality and Shared Pathophysiology Between Depression and Type 2 Diabetes. Diabetes Care. 2023 Sep 1;46(9):1707-1714. doi: 10.2337/dc22-2373 PMID: 37494602) first identified loci from two separate GWAS outputs of MTAG (let's assume as GWASnew1 with 120 loci and GWASnew2 with 120 loci). Then, they took the intersection of significant loci from both GWAS (assuming there are 7 intersecting SNPs). The shared SNPs (these 7 SNPs) are considered to be the shared SNPs between the two traits. Do you think this approach is feasible?

paturley commented 8 months ago
  1. The order of the input GWAS should not matter

  2. MTAG summary statistics are (in expectation) the same as what you would obtain if you had just increased the sample size in a pure GWAS of the input phenotype. I don't know what "cross-trait SNPs" means, but it is the case that MTAG is better powered to identify SNPs that have true causal effects on both traits. This does not necessarily mean that the new SNPs are definitely associated with both traits. I have not systematically tested the ability of MTAG to find such SNPs and I'm not aware of any paper that has.

  3. This approach makes as much sense to me as taking the intersection of two non-MTAG GWAS SNPs that are significant and calling such SNPs shared.

On Thu, Dec 14, 2023 at 4:52 AM Louchen @.***> wrote:

@paturley https://github.com/paturley Hello Patrick, I still have some doubts about your first and second responses.

In the hypothetical scenario I described, MTAG outputs two new GWAS results, where GWAS1new is an updated result for diabetes with 120 significant SNPs and GWAS2new for MDD (Major Depressive Disorder) also with 120 significant SNPs. There are 20 new SNPs identified in either GWAS1 or GWAS2.

1.Will the order in which GWAS1 and GWAS2 are input into MTAG affect the output results (I know the output order will change)?

2.Regarding the newly discovered SNPs (let's assume 20) in GWAS1new, I am still unclear about how to understand the implications of these new SNPs. In the Nature Communications paper by Yang Y et al., "Investigating the shared genetic architecture between multiple sclerosis and inflammatory bowel diseases" (Nat Commun. 2021 Sep 24;12(1):5641. doi: 10.1038/s41467-021-25768-0. PMID: 34561436), for GWAS1new, these newly discovered SNPs (assuming there are 20), after removing SNPs that overlap with genome-wide significant SNPs in the respective single trait GWAS, and ensuring these newly discovered SNPs were independent (i.e., LD r2 < 0.05 within 1,000-kb windows) from genome-wide significant SNPs in the respective single trait, the remaining newly discovered SNPs (let's assume 2) are identified as new cross-trait SNPs. Do you think it is appropriate to understand the newly discovered SNPs in MTAG in this way?

3.Additionally, the paper (Maina JG et al., Bidirectional Mendelian Randomization and Multiphenotype GWAS Show Causality and Shared Pathophysiology Between Depression and Type 2 Diabetes. Diabetes Care. 2023 Sep 1;46(9):1707-1714. doi: 10.2337/dc22-2373 PMID: 37494602) first identified loci from two separate GWAS outputs of MTAG (let's assume as GWASnew1 with 120 loci and GWASnew2 with 120 loci). Then, they took the intersection of significant loci from both GWAS (assuming there are 7 intersecting SNPs). The shared SNPs (these 7 SNPs) are considered to be the shared SNPs between the two traits. Do you think this approach is feasible?

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/197#issuecomment-1855525755, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5NVVAKYQUT6X2ZPN5TYJLD5PAVCNFSM6AAAAABAN5NAK2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJVGUZDKNZVGU . You are receiving this because you were mentioned.Message ID: @.***>