JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
169 stars 54 forks source link

mtga MemoryError #93

Open grazyoshida opened 4 years ago

grazyoshida commented 4 years ago

Hi there,

I am running mgta using two traits and ~1 million of markers. I met the following error:

2020/05/05/09:38:55 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> <> MTAG: Multi-trait Analysis of GWAS <> Version: 1.0.8 <> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley <> Harvard University Department of Economics / Broad Institute of MIT and Harvard <> GNU General Public License v3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> <> Note: It is recommended to run your own QC on the input before using this program. <> Software-related correspondence: maghzian@nber.org <> All other correspondence: paturley@broadinstitute.org <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mtag.py \ --force \ --stream-stdout \ --n-min 0.0 \ --sumstats HW1.txt,HL1.txt \ --out ./resumen

2020/05/05/09:38:55 AM Beginning MTAG analysis... 2020/05/05/09:38:55 AM MTAG will use the Z column for analyses. 2020/05/05/09:39:01 AM Read in Trait 1 summary statistics (1060504 SNPs) from HW1.txt ... 2020/05/05/09:39:01 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2020/05/05/09:39:01 AM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/05/05/09:39:01 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2020/05/05/09:39:01 AM Interpreting column names as follows: 2020/05/05/09:39:01 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats.

2020/05/05/09:39:01 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2020/05/05/09:39:06 AM Read 1060504 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 8686 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 1051818 SNPs remain. 2020/05/05/09:39:06 AM Removed 0 SNPs with duplicated rs numbers (1051818 SNPs remain). 2020/05/05/09:39:07 AM Removed 0 SNPs with N < 0.0 (1051818 SNPs remain). 2020/05/05/09:39:14 AM Median value of SIGNED_SUMSTAT was -0.005398063, which seems sensible. 2020/05/05/09:39:14 AM Dropping snps with null values 2020/05/05/09:39:14 AM Metadata: 2020/05/05/09:39:14 AM Mean chi^2 = 0.36 2020/05/05/09:39:14 AM WARNING: mean chi^2 may be too small. 2020/05/05/09:39:15 AM Lambda GC = 0.35 2020/05/05/09:39:15 AM Max chi^2 = 7.408 2020/05/05/09:39:15 AM 0 Genome-wide significant SNPs (some may have been removed by filtering). 2020/05/05/09:39:15 AM Conversion finished at Tue May 5 09:39:15 2020 2020/05/05/09:39:15 AM Total time elapsed: 13.58s 2020/05/05/09:39:18 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2020/05/05/09:39:18 AM Munging of Trait 1 complete. SNPs remaining: 1051818 2020/05/05/09:39:18 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2020/05/05/09:39:18 AM Warning: The mean chi2 statistic of trait 1 is less 1.02 - MTAG estimates may be unstable. 2020/05/05/09:39:27 AM Read in Trait 2 summary statistics (1060504 SNPs) from HL1.txt ... 2020/05/05/09:39:27 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2020/05/05/09:39:27 AM Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< 2020/05/05/09:39:27 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2020/05/05/09:39:27 AM Interpreting column names as follows: 2020/05/05/09:39:27 AM snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats.

2020/05/05/09:39:27 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. 2020/05/05/09:39:31 AM Read 1060504 SNPs from --sumstats file. Removed 0 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.01. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 8686 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 1051818 SNPs remain. 2020/05/05/09:39:31 AM Removed 0 SNPs with duplicated rs numbers (1051818 SNPs remain). 2020/05/05/09:39:32 AM Removed 0 SNPs with N < 0.0 (1051818 SNPs remain). 2020/05/05/09:39:46 AM Median value of SIGNED_SUMSTAT was 0.01791777, which seems sensible. 2020/05/05/09:39:46 AM Dropping snps with null values 2020/05/05/09:39:46 AM Metadata: 2020/05/05/09:39:46 AM Mean chi^2 = 0.018 2020/05/05/09:39:46 AM WARNING: mean chi^2 may be too small. 2020/05/05/09:39:46 AM Lambda GC = 0.016 2020/05/05/09:39:46 AM Max chi^2 = 0.416 2020/05/05/09:39:46 AM 0 Genome-wide significant SNPs (some may have been removed by filtering). 2020/05/05/09:39:46 AM Conversion finished at Tue May 5 09:39:46 2020 2020/05/05/09:39:46 AM Total time elapsed: 19.77s 2020/05/05/09:39:49 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> 2020/05/05/09:39:49 AM Munging of Trait 2 complete. SNPs remaining: 1051818 2020/05/05/09:39:49 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2020/05/05/09:39:49 AM Warning: The mean chi2 statistic of trait 2 is less 1.02 - MTAG estimates may be unstable. 2020/05/05/09:39:55 AM Dropped 187316 SNPs due to strand ambiguity, 864502 SNPs remain in intersection after merging trait1 2020/05/05/09:40:00 AM Dropped 0 SNPs due to strand ambiguity, 864502 SNPs remain in intersection after merging trait2 2020/05/05/09:40:00 AM ... Merge of GWAS summary statistics complete. Number of SNPs: 864502 2020/05/05/09:40:04 AM Using 864502 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) 2020/05/05/09:40:04 AM Estimating sigma.. 2020/05/05/09:41:08 AM Checking for positive definiteness .. 2020/05/05/09:41:08 AM Sigma hat: [[ 0.811 0.014] [ 0.014 1.148]] 2020/05/05/09:41:08 AM Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation. 2020/05/05/09:41:08 AM Beginning estimation of Omega ... 2020/05/05/09:41:08 AM Using GMM estimator of Omega .. 2020/05/05/09:41:08 AM Checking for positive definiteness .. 2020/05/05/09:41:08 AM matrix is not positive definite, performing adjustment.. 2020/05/05/09:41:09 AM Warning: max number of iterations reached in adjustment procedure. Sigma matrix used is still non-positive-definite. 2020/05/05/09:41:09 AM Completed estimation of Omega ... 2020/05/05/09:41:09 AM Beginning MTAG calculations... 2020/05/05/09:41:09 AM Traceback (most recent call last): File "./mtag.py", line 1567, in mtag(args) File "./mtag.py", line 1443, in mtag mtag_betas, mtag_se, mtag_factor = mtag_analysis(Zs, Ns, args.omega_hat, args.sigma_hat) File "./mtag.py", line 785, in mtag_analysis W_N_inv = np.linalg.inv(W_N) File "/usr/lib64/python2.7/site-packages/numpy/linalg/linalg.py", line 445, in inv return wrap(solve(a, identity(a.shape[0], dtype=a.dtype))) File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line 1913, in identity return eye(n, dtype=dtype) File "/usr/lib64/python2.7/site-packages/numpy/lib/twodim_base.py", line 210, in eye m = zeros((N, M), dtype=dtype) MemoryError 2020/05/05/09:41:09 AM Analysis terminated from error at Tue May 5 09:41:09 2020 2020/05/05/09:41:09 AM Total time elapsed: 2.0m:13.34s

Furthermore the chi2 is very slow for both trits. Can anyone help me?! Thank you!

GMY

paturley commented 4 years ago

Hmm. It looks like your machine is running out of memory. I would think that you would be OK with just 1M SNPs though. Can you send the complete log file?

On Tue, May 5, 2020 at 9:30 AM grazyoshida notifications@github.com wrote:

Hi there,

I am running mgta using two traits and ~1 million of markers. I met the following error:

Traceback (most recent call last): File "./mtag.py", line 1567, in mtag(args) File "./mtag.py", line 1443, in mtag mtag_betas, mtag_se, mtag_factor = mtag_analysis(Zs, Ns, args.omega_hat, args.sigma_hat) File "./mtag.py", line 785, in mtag_analysis W_N_inv = np.linalg.inv(W_N) File "/usr/lib64/python2.7/site-packages/numpy/linalg/linalg.py", line 445, in inv return wrap(solve(a, identity(a.shape[0], dtype=a.dtype))) File "/usr/lib64/python2.7/site-packages/numpy/core/numeric.py", line 1913, in identity return eye(n, dtype=dtype) File "/usr/lib64/python2.7/site-packages/numpy/lib/twodim_base.py", line 210, in eye m = zeros((N, M), dtype=dtype) MemoryError Analysis terminated from error at Tue May 5 09:21:35 2020 Total time elapsed: 1.0m:27.91s

Can anyone help me?! Thank you!

GMY

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/omeed-maghzian/mtag/issues/93, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFBUB5NWFSJV7ZI3U5LVUUTRQAIFVANCNFSM4MZSM4LQ .

bnj50 commented 2 years ago

Hi...i have the same issue. exactly how much memory needed for 95 sumstats of 7 Mil markers?