JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
169 stars 54 forks source link

cannot run maxFDR calculations #84

Open poonphy opened 4 years ago

poonphy commented 4 years ago

Hi! I have run a MTAG analysis with maxFDR calculations for 10 traits. The MTAG results came out successfully. However, the maxFDR results could not be generated even after running for more than 1000 hours. The log for my analysis is as follow:

2019/10/16/09:39:37 PM MTAG results saved to file. 2019/10/16/09:39:37 PM Beginning maxFDR calculations. Depending on the number of grid points specified, this might take some time... 2019/10/16/09:39:37 PM T=10

Am I running too many traits at the same time? Is there any upper limit of the traits for the maxFDR calculations? Any other causes?

Thank you!

paturley commented 4 years ago

By default, maxFDR does a grid search across different combinations of possible architectures. The number of points it tests is roughly 10^T, where T is the number of traits. So I would guess that in your particular case, maxFDR is still just chugging along but will take much longer to complete than you would want to wait.

We did implement a few options into the maxFDR algorithm for these sorts of cases. For example, you could just estimate the FDR at the maximum likelihood value for the architecture. That options in --fit_ss. I'm not sure how well that would work with 10 traits, but it might be worth trying. You could also just hand select a number of grid points you think are most likely (or mostly likely to be problematic) using the --grid_file option. The --n_approx options should speed up calculations as well. You may also consider reducing the --intervals options to a smaller number than 10. Check out the tutorial for implementing any of these options.

All that said, any of these options will mean that your maxFDR value will be less reliable than if you do a complete finer grid, but it's better than nothing, of course.

On Thu, Dec 5, 2019 at 10:56 AM poonphy notifications@github.com wrote:

Hi! I have run a MTAG analysis with maxFDR calculations for 10 traits. The MTAG results came out successfully. However, the maxFDR results could not be generated even after running for more than 1000 hours. The log for my analysis is as follow:

2019/10/16/09:39:37 PM MTAG results saved to file. 2019/10/16/09:39:37 PM Beginning maxFDR calculations. Depending on the number of grid points specified, this might take some time... 2019/10/16/09:39:37 PM T=10

Am I running too many traits at the same time? Is there any upper limit of the traits for the maxFDR calculations? Any other causes?

Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/omeed-maghzian/mtag/issues/84?email_source=notifications&email_token=AFBUB5OYVXAECWNYZOQBEADQXEQCHA5CNFSM4JV3XEP2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H6LUQRA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5KJHILMJWXTBH7CLGTQXEQCHANCNFSM4JV3XEPQ .

pjordab commented 4 years ago

Hi to developers and users,

First of all, thanks again for this tool and for your time.

I am currently using MTAG for analysis that contain a number of traits between T=7 and T=18.

For each group of analyses, only the MTAG from the trait1 is the one of interest.

In the MTAG results, I obtain snps that reach significance even if the p-value from the single trait original GWAS was inferior to 0.05/0.01.

I assume that these SNPs are mainly false positives, so we are considering restricting the MTAG results only to SNPs with a p-value <0.05 or <0.01 in the original GWAS. Do you consider it appropriate? Would you recommend another cut-off for p-value?(1)

On the other hand, and following your recommendations in the manuscript we are going to calculate the FDR.

I have some concerning about the FDR option.

(2) Do you think it is feasible to calculate FDR for MTAG analysis of T=7 to T=18?

In these cases, I’ve seen that you suggest to use the option –n-aprox. (3) From what I understand from the updated instructions, the option --n-aprox is currently included by default in the FDR, is it correct?

You also talk about the options: --fit_ss --grid_file --intervals Do you recommend them in my case? What do they do exactly? I should use them all together?

In the case the analysis is not feasible with the number of T we are including [up to which number of traits FDR calculation should be feasible? (4)]; does it make sense to do the FDR calculation by paired analysis?

  Trait1 Trait2 Trait3 Trait4 Trait5 Trait6 Trait7
Trait1 fdr1 fdr2 fdr3 fdr4 fdr5 fdr6 fdr7
Trait2 fdr8 fdr9 fdr10 fdr11 fdr12 fdr13 fdr14
Trait3 fdr15 fdr16 fdr17 fdr18 fdr19 fdr20 fdr21
Trait4 fdr22 fdr23 fdr24 fdr25 fdr26 fdr27 fdr28
Trait5 fdr29 fdr30 fdr31 fdr32 fdr33 fdr34 fdr35
Trait6 fdr36 fdr37 fdr38 fdr39 fdr40 fdr41 fdr42
Trait7 fdr43 fdr44 fdr45 fdr46 fdr47 fdr48 fdr49

So in this case, as I am only interested in MTAG_trait1.txt results, calculate fdr from fdr1 to fdr7? Or should I calculate all? (5) In order to identify most suspicious sumstats to lead a FDR, which parameters should I check? (6) As you have pointed before the different power of the sample is one of the issues that lead to greater FDR is there any cut-off value for an acceptable mean chi^2 to avoid false discovery rate? (7)

Trait # SNPs used ... MTAG mean chi^2
1 ...trait1 3570295 ... 1.034
2 ...trait2 3570295 ... 1.150
3 ...trait3 3570295 ... 2.929
4 ...trait4 3570295 ... 1.439
5 ...trait5 3570295 ... 1.034
6 ...trait6 3570295 ... 2.961
7 ...trait7 3570295 ... 2.721

Do you see any issues in this case for example?

Lots of thanks in advance! I’ll be very happy if you can help me with some of these issues.

Paloma

paturley commented 4 years ago

Hello Paloma,

It's a bit difficult for me to comment on all your questions since I haven't exhaustively tested the software in those scenarios.

With respect to whether you can trust SNPs that have large p values in the GWAS but small p values in MTAG, I don't know why these SNPs would be less likely to be false positives as long as MTAG's assumptions hold. It looks like you have substantially less power for trait1 than you do for other traits, so I would expect that some of them would be quite insignificant in original GWAS even if they were true positives. That said, if MTAG's assumptions are false, then it's also possible that MTAG is at a high risk for a substantial FDR.

Calculating the maxFDR as it is described in the paper for more than just a few traits seems like it would be computationally infeasible. The grid file and intervals options are manual ways of making the search space less dense, but by doing that, you can't be sure that you are actually getting close to the point that maximizes the FDR. The fit-ss option first fits the data to a spike-and-slab distribution and calculates what the FDR would be at that point, but I don't think that the software will be able to solve the maximum-likelihood problem to fit the spike-and-slab for more than 3 or 4 traits as well.

So I'm not sure what to tell you. You could calculate the maxFDR for each trait paired with just trait 1 like you said, but I don't have a good intuition if the maxFDR of any particular pair is larger or smaller than the maxFDR of MTAGing everything together. You could also MTAG all by the first trait, pick the resulting summary statistics with the highest power, and MTAG trait one with the MTAG trait. Again I'm just throwing out random ideas though. I've not tested MTAG in such a setting, so I don't know what the performance gains and FDR risks are for any of these approaches.

Best of luck. Sorry I've not been a lot of help.

On Tue, Oct 6, 2020 at 12:51 PM pjordab notifications@github.com wrote:

Hi to developers and users,

First of all, thanks again for this tool and for your time.

I am currently using MTAG for analysis that contain a number of traits between T=7 and T=18.

For each group of analyses, only the MTAG from the trait1 is the one of interest.

In the MTAG results, I obtain snps that reach significance even if the p-value from the single trait original GWAS was inferior to 0.05/0.01.

I assume that these SNPs are mainly false positives, so we are considering restricting the MTAG results only to SNPs with a p-value <0.05 or <0.01 in the original GWAS. Do you consider it appropriate? Would you recommend another cut-off for p-value?(1)

On the other hand, and following your recommendations in the manuscript we are going to calculate the FDR.

I have some concerning about the FDR option.

(2) Do you think it is feasible to calculate FDR for MTAG analysis of T=7 to T=18?

In these cases, I’ve seen that you suggest to use the option –n-aprox. (3) From what I understand from the updated instructions, the option --n-aprox is currently included by default in the FDR, is it correct?

You also talk about the options: --fit_ss --grid_file --intervals Do you recommend them in my case? What do they do exactly? I should use them all together?

In the case the analysis is not feasible with the number of T we are including [up to which number of traits FDR calculation should be feasible? (4)]; does it make sense to do the FDR calculation by paired analysis? Trait1 Trait2 Trait3 Trait4 Trait5 Trait6 Trait7 Trait1 fdr1 fdr2 fdr3 fdr4 fdr5 fdr6 fdr7 Trait2 fdr8 fdr9 fdr10 fdr11 fdr12 fdr13 fdr14 Trait3 fdr15 fdr16 fdr17 fdr18 fdr19 fdr20 fdr21 Trait4 fdr22 fdr23 fdr24 fdr25 fdr26 fdr27 fdr28 Trait5 fdr29 fdr30 fdr31 fdr32 fdr33 fdr34 fdr35 Trait6 fdr36 fdr37 fdr38 fdr39 fdr40 fdr41 fdr42 Trait7 fdr43 fdr44 fdr45 fdr46 fdr47 fdr48 fdr49

So in this case, as I am only interested in MTAG_trait1.txt results, calculate fdr from fdr1 to fdr7? Or should I calculate all? (5) In order to identify most suspicious sumstats to lead a FDR, which parameters should I check? (6) As you have pointed before the different power of the sample is one of the issues that lead to greater FDR is there any cut-off value for an acceptable mean chi^2 to avoid false discovery rate? (7)

Trait # SNPs used ... MTAG mean chi^2 1 ...trait1 3570295 ... 1.034 2 ...trait2 3570295 ... 1.150 3 ...trait3 3570295 ... 2.929 4 ...trait4 3570295 ... 1.439 5 ...trait5 3570295 ... 1.034 6 ...trait6 3570295 ... 2.961 7 ...trait7 3570295 ... 2.721

Do you see any issues in this case for example?

Lots of thanks in advance! I’ll be very happy if you can help me with some of these issues.

Paloma

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/84#issuecomment-704410445, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5O2TV2IMSLQUZBZYYLSJNDJTANCNFSM4JV3XEPQ .

pjordab commented 3 years ago

Hello developers and users,

First of all, many thanks for your help and previous answers.

To calculate the maxFDR in an analysis of more than 4 traits, say 9, I am planning to MTAG the first 4 traits and then MTAG the result MTAG 1 with the remaining 3 and finally MTAG the result MTAG 2 with the remaining 2:

Trait1+trait2+trait3+trait4 -> MTAG result1 & maxFDR1 (i.e. 0.01) MTAG result1+trait5+trait6+trait7 -> MTAG result2 & maxFDR2 (i.e. 0.05) MTAG result2+trait8+trait9 - >MTAG result3 & maxFDR3 (i.e 0.03)

I will get a maxFDR1 from the first analysis and a maxFDR2 and maxFDR3 from the subsequent analyses.

What do you think my maximum FDR would be, the final result (maxFDR3); or the sum of the 3 maxFDR obtained in each analysis (maxFDR1+maxFDR2+maxFDR3) ?

Many thanks!

Paloma

paturley commented 3 years ago

Oh, wow! This is a pretty complicated procedure. I'm not sure there is a simple way to calculate the maxFDR as a function of the maxFDR for each sub-analysis. A quick question though: maxFDR is a trait specific number, so I don't know what maxFDR1 means exactly. Is that the maxFDR for trait1?

On Tue, Sep 21, 2021 at 9:17 AM pjordab @.***> wrote:

Hello developers and users,

First of all, many thanks for your help and previous answers.

To calculate the maxFDR in an analysis of more than 4 traits, say 9, I am planning to MTAG the first 4 traits and then MTAG the result MTAG 1 with the remaining 3 and finally MTAG the result MTAG 2 with the remaining 2:

Trait1+trait2+trait3+trait4 -> MTAG result1 & maxFDR1 (i.e. 0.01) MTAG result1+trait5+trait6+trait7 -> MTAG result2 & maxFDR2 (i.e. 0.05) MTAG result2+trait8+trait9 - >MTAG result3 & maxFDR3 (i.e 0.03)

I will get a maxFDR1 from the first analysis and a maxFDR2 and maxFDR3 from the subsequent analyses.

What do you think my maximum FDR would be, the final result (maxFDR3); or the sum of the 3 maxFDR obtained in each analysis (maxFDR1+maxFDR2+maxFDR3) ?

Many thanks!

Paloma

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/84#issuecomment-923979396, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5I7O72LM5XQUWWRWHLUDCAYHANCNFSM4JV3XEPQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

pjordab commented 3 years ago

Yes, I'm only using the maxFDRs for trait1.

pjordab commented 3 years ago

Hi again, so what would be your recommendation for calculating the FDR when more than 4 traits are included? Thank you very much! Paloma

paturley commented 3 years ago

Sorry for the slow response here. You just have me a bit stumped. The maxFDR from just the third analysis (maxFDR3) is definitely going to be too small and the sum of the different maxFDRs may also be too small. But the sum is a really confusing quantity, so I'm not sure. If I were you, I would maybe do the following

Trait1+trait2+trait3+trait4 -> MTAG result1 & maxFDR1 (i.e. 0.01) Trait1+trait5+trait6+trait7 -> MTAG result2 & maxFDR2 (i.e. 0.05) Trait1+trait8+trait9 - >MTAG result3 & maxFDR3 (i.e 0.03)

And use the sum of these three maxFDR analyses to approximate how bad things might be. That would probably be conservative since you are looking at the risk of including the different traits on the unbiased GWAS summary statistics rather than on (potentially contaminated) MTAG coefficients.

To be 100% transparent though, I'm not totally sure this works though and I haven't tested MTAG to see if this produces reliable results, so take this advice at your own peril. :)

On Thu, Sep 23, 2021 at 10:33 AM pjordab @.***> wrote:

Hi again, so what would be your recommendation for calculating the FDR when more than 4 traits are included? Thank you very much! Paloma

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/84#issuecomment-925872884, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5NH7AIJ2ENRAUF6U43UDM3CFANCNFSM4JV3XEPQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.