JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
169 stars 54 forks source link

Memory Error with many traits #48

Open Robbie90 opened 5 years ago

Robbie90 commented 5 years ago

Hi I'm trying to run MTAG on 93 traits simultaneously...

After Munging and Merging the log was the following:

..... 2018/10/02/02:28:26 PM Dropped 0 SNPs due to strand ambiguity, 1915103 SNPs remain in intersection after merging trait93 2018/10/02/02:28:26 PM ... Merge of GWAS summary statistics complete. Number of SNPs: 2258212 2018/10/02/02:30:42 PM Using 1915103 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) 2018/10/02/02:30:42 PM Estimating sigma.. 2018/10/03/12:43:54 AM Checking for positive definiteness .. 2018/10/03/12:43:55 AM Sigma hat: [[ 1.052 0.41 0.022 ... 0.039 -0.024 -0.135] [ 0.41 1.048 0.034 ... 0.102 -0.064 -0.048] [ 0.022 0.034 1.043 ... -0.025 0.801 0.059] ... [ 0.039 0.102 -0.025 ... 1.018 -0.006 0.055] [-0.024 -0.064 0.801 ... -0.006 1.045 0.031] [-0.135 -0.048 0.059 ... 0.055 0.031 1.003]] 2018/10/03/12:43:55 AM Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation. 2018/10/03/12:43:56 AM Beginning estimation of Omega ... 2018/10/03/12:50:34 AM Using GMM estimator of Omega .. 2018/10/03/12:51:30 AM Traceback (most recent call last): File "/home/project/mtag/mtag.py", line 1514, in mtag(args) File "/home/project/mtag/mtag.py", line 1307, in mtag args.omega_hat = estimate_omega(args, Zs[not_SA], Ns[not_SA], args.sigma_hat) File "/home/project/mtag/mtag.py", line 715, in estimate_omega return _posDef_adjustment(gmm_omega(Zs,Ns,sigma_LD)) File "/home/project/mtag/mtag.py", line 607, in gmm_omega N_mats = np.sqrt(np.einsum('mp,mq->mpq', Ns,Ns)) MemoryError 2018/10/03/12:51:30 AM Analysis terminated from error at Wed Oct 3 00:51:30 2018 2018/10/03/12:51:30 AM Total time elapsed: 12.0h:51.0m:0.25s

So I suppose some of the matrices used in the calculation of Omega using a GMM becomes incredibly big. This also this happens quite fast (~1 minute distance between "Using GMM estimator of Omega .." and the error message).

Do you have any suggestions on how to proceed from this?

Would it make sense to try calculate an estimate of genetic correlation first to divide all traits into correlation blocks to the run MTAG including only traits within each block? If yes, any tool you would suggest to do that?

Thank you so much for your help!

Cheers, Robbie

paturley commented 5 years ago

Hi Robbie,

Wow! I don't think I ever envisioned running MTAG on 93 traits simultaneously. I do worry a bit about inflation of the test statistics if you include that many (see Figure 1 of the manuscript). Are these traits that expect to be highly correlated genetically or just a random set of 93?

In terms of where things break down, my guess is that it's in the storage of the Omega and Sigma matrices. If I recall correctly, MTAG stores a TxTxM hypermatrix of the Sigma matrices for each SNP, where T is the number of traits and M is the number of SNPs. That would be a very large object in your case. Unless you want to dig into the guts of the software and recode to so that it calculates each matrix serially rather than all at once, I'm not sure of a great way around this. Your idea of splitting traits into sets of closely related outcomes doesn't sound bad. Especially if there are some logical splits. How many traits can you have before you run into the memory problem?

On Tue, Nov 6, 2018 at 2:11 AM Robbie90 notifications@github.com wrote:

Hi I'm trying to run MTAG on 93 traits simultaneously...

After Munging and Merging the log was the following:

..... 2018/10/02/02:28:26 PM Dropped 0 SNPs due to strand ambiguity, 1915103 SNPs remain in intersection after merging trait93 2018/10/02/02:28:26 PM ... Merge of GWAS summary statistics complete. Number of SNPs: 2258212 2018/10/02/02:30:42 PM Using 1915103 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) 2018/10/02/02:30:42 PM Estimating sigma.. 2018/10/03/12:43:54 AM Checking for positive definiteness .. 2018/10/03/12:43:55 AM Sigma hat: [[ 1.052 0.41 0.022 ... 0.039 -0.024 -0.135] [ 0.41 1.048 0.034 ... 0.102 -0.064 -0.048] [ 0.022 0.034 1.043 ... -0.025 0.801 0.059] ... [ 0.039 0.102 -0.025 ... 1.018 -0.006 0.055] [-0.024 -0.064 0.801 ... -0.006 1.045 0.031] [-0.135 -0.048 0.059 ... 0.055 0.031 1.003]] 2018/10/03/12:43:55 AM Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation. 2018/10/03/12:43:56 AM Beginning estimation of Omega ... 2018/10/03/12:50:34 AM Using GMM estimator of Omega .. 2018/10/03/12:51:30 AM Traceback (most recent call last): File "/home/project/mtag/mtag.py", line 1514, in mtag(args) File "/home/project/mtag/mtag.py", line 1307, in mtag args.omega_hat = estimate_omega(args, Zs[not_SA], Ns[not_SA], args.sigma_hat) File "/home/project/mtag/mtag.py", line 715, in estimate_omega return _posDef_adjustment(gmm_omega(Zs,Ns,sigma_LD)) File "/home/project/mtag/mtag.py", line 607, in gmm_omega N_mats = np.sqrt(np.einsum('mp,mq->mpq', Ns,Ns)) MemoryError 2018/10/03/12:51:30 AM Analysis terminated from error at Wed Oct 3 00:51:30 2018 2018/10/03/12:51:30 AM Total time elapsed: 12.0h:51.0m:0.25s

So I suppose some of the matrices used in the calculation of Omega using a GMM becomes incredibly big. This also this happens quite fast (~1 minute distance between "Using GMM estimator of Omega .." and the error message).

Do you have any suggestions on how to proceed from this?

Would it make sense to try calculate an estimate of genetic correlation first to divide all traits into correlation blocks to the run MTAG including only traits within each block? If yes, any tool you would suggest to do that?

Thank you so much for your help!

Cheers, Robbie

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/omeed-maghzian/mtag/issues/48, or mute the thread https://github.com/notifications/unsubscribe-auth/AUNA9cGSL0wZTyJwVGnjNWFj-cY7PWN2ks5usTY1gaJpZM4YP3zK .

Robbie90 commented 5 years ago

Hi Patrick, I don't think I will be the only one trying something on this scale, especially now in the UK biobank era :) I also though that might have created a problem especially since the average chi for each study is not that big. However I still wanted to give it a go. I expected them to be all correlated, all trait should share at least some genetic with another trait.

Not sure how may I can try before crashing it. I'll try to maybe run a set a parallel analyses with say 20, 40, 60, and 80 and let you know which one stops first.

Cheers, Robbie

chenyan53535 commented 5 years ago

Hi Patrick, I don't think I will be the only one trying something on this scale, especially now in the UK biobank era :) I also though that might have created a problem especially since the average chi for each study is not that big. However I still wanted to give it a go. I expected them to be all correlated, all trait should share at least some genetic with another trait.

Not sure how may I can try before crashing it. I'll try to maybe run a set a parallel analyses with say 20, 40, 60, and 80 and let you know which one stops first.

Cheers, Robbie

Hi Robbie, I also meet the Memory Error when I run with 78 traits. I just want to know if you solve this problem and how do you solve this problem. Thanks.

Cheers, Yan