JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
171 stars 54 forks source link

ValueError: On entry to DLASCL parameter number 4 had an illegal value #166

Open CharlesLambert70 opened 2 years ago

CharlesLambert70 commented 2 years ago

hi,

I have encountered a problem and had no idea why this happen.

Calling ./mtag.py \ --p-name pval \ --maf-min 0 \ --stream-stdout \ --n-min 0.0 \ --sumstats diabetes_new2.tsv,scz_new2.tsv \ --out ./new

Beginning MTAG analysis... MTAG will use the Z column for analyses. Read in Trait 1 summary statistics (10894596 SNPs) from diabetes_new2.tsv ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats.

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. Read 10894596 SNPs from --sumstats file. Removed 3672 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.0. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 10890924 SNPs remain. Removed 7665 SNPs with duplicated rs numbers (10883259 SNPs remain). Removed 0 SNPs with N < 0.0 (10883259 SNPs remain). Median value of SIGNED_SUMSTAT was -0.0125334695081, which seems sensible. Dropping snps with null values

Metadata: Mean chi^2 = 1.215 Lambda GC = 1.147 Max chi^2 = 685.138 4840 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Sun Sep 4 10:07:36 2022 Total time elapsed: 2.0m:13.48s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 1 complete. SNPs remaining: 10890924 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Trait 1: Dropped 7665 SNPs for duplicate values in the "snp_name" column Read in Trait 2 summary statistics (10894596 SNPs) from scz_new2.tsv ... <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><>< <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Interpreting column names as follows: snpid: Variant ID (e.g., rs number) n: Sample size a1: a1, interpreted as ref allele for signed sumstat. pval: p-Value a2: a2, interpreted as non-ref allele for signed sumstat. z: Directional summary statistic as specified by --signed-sumstats.

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time. Read 10894596 SNPs from --sumstats file. Removed 3672 SNPs with missing values. Removed 0 SNPs with INFO <= None. Removed 0 SNPs with MAF <= 0.0. Removed 0 SNPs with SE <0 or NaN values. Removed 0 SNPs with out-of-bounds p-values. Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped. 10890924 SNPs remain. Removed 7665 SNPs with duplicated rs numbers (10883259 SNPs remain). Removed 0 SNPs with N < 0.0 (10883259 SNPs remain). Median value of SIGNED_SUMSTAT was -0.0752698620998, which seems sensible. Dropping snps with null values

Metadata: Mean chi^2 = 1.004 WARNING: mean chi^2 may be too small. Lambda GC = 1.097 Max chi^2 = 70.185 548 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Sun Sep 4 10:11:08 2022 Total time elapsed: 2.0m:12.19s <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> Munging of Trait 2 complete. SNPs remaining: 10890924 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Trait 2: Dropped 7665 SNPs for duplicate values in the "snp_name" column Dropped 1637431 SNPs due to strand ambiguity, 9245828 SNPs remain in intersection after merging trait1 Dropped 0 SNPs due to strand ambiguity, 9245828 SNPs remain in intersection after merging trait2 ... Merge of GWAS summary statistics complete. Number of SNPs: 9245828 Using 9245828 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity) Estimating sigma.. On entry to DLASCL parameter number 4 had an illegal value Traceback (most recent call last): File "/home/gyu/mtag/mtag.py", line 1577, in mtag(args) File "/home/gyu/mtag/mtag.py", line 1358, in mtag args.sigma_hat = estimate_sigma(DATA[not_SA], args) File "/home/gyu/mtag/mtag.py", line 472, in estimate_sigma rg_results = sumstats_sig.estimate_rg(args_ldsc_rg, Logger_to_Logging()) File "/home/gyu/mtag/ldsc_mod/ldscore/sumstats.py", line 445, in estimate_rg rghat = _rg(loop, args, log, M_annot, ref_ld_cnames, w_ld_cname, k, i) File "/home/gyu/mtag/ldsc_mod/ldscore/sumstats.py", line 593, in _rg intercept_gencov=intercepts[2], n_blocks=n_blocks, twostep=args.two_step) File "/home/gyu/mtag/ldsc_mod/ldscore/regressions.py", line 689, in init slow=slow, twostep=twostep) File "/home/gyu/mtag/ldsc_mod/ldscore/regressions.py", line 347, in init slow=slow, step1_ii=step1_ii, old_weights=old_weights) File "/home/gyu/mtag/ldsc_mod/ldscore/regressions.py", line 214, in init x, yp, update_func, n_blocks, slow=slow, w=initial_w) File "/home/gyu/mtag/ldsc_mod/ldscore/irwls.py", line 67, in init x, y, update_func, n_blocks, w, slow=slow, separators=separators) File "/home/gyu/mtag/ldsc_mod/ldscore/irwls.py", line 114, in irwls new_w = np.sqrt(update_func(cls.wls(x, y, w))) File "/home/gyu/mtag/ldsc_mod/ldscore/irwls.py", line 162, in wls coef = np.linalg.lstsq(x, y) File "/home/gyu/anaconda3/envs/python27/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 2236, in lstsq x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj) ValueError: On entry to DLASCL parameter number 4 had an illegal value Analysis terminated from error at Sun Sep 4 10:13:57 2022 Total time elapsed: 8.0m:57.46s

My summary data are downloaded from UK Biobank (https://pheweb.org/UKB-Neale/phenotypes). However, it does not include z-score and MAF. So I calculated the z-score based on beta and p-value. And I specified the MAF as 0.5 for all SNPs.

Summary data are like this:

chr bpos a1 a2 snpid nearest_genes pval beta ac z n freq 1 693731 A G rs12238997 AL669831.1 0.7 -0.00033 78029.2 -0.385320466407568 337199 0.5 1 717587 G A rs144155419 AL669831.1 0.54 0.0014 10541.4 0.612812991016627 337199 0.5 1 730087 T C rs148120343 AL669831.1 0.98 3.1e-05 38010.4 0.0250689082587111 337199 0.5 1 731718 T C rs58276399 AL669831.1 0.72 -0.00029 81978.9 -0.358458793251194 337199 0.5 1 734349 T C rs141242758 AL669831.1 0.79 -0.00022 81456.1 -0.266310613204095 337199 0.5 1 740284 C T rs61770167 AL669831.1 0.45 -0.0028 3927.13 -0.755415026360469 337199 0.5 1 742813 C T rs112573343 AL669831.1 0.83 -0.0015 1269.37 -0.214701568001744 337199 0.5 1 753405 C A rs3115860 AL669831.1 0.57 0.00044 585994 0.568051498338983 337199 0.5 1 753541 G A rs2073813 AL669831.1 0.52 -5e-04 86622.8 -0.643345405392917 337199 0.5

And the code is below:

python /home/gyu/mtag/mtag.py \ --sumstats diabetes_new2.tsv,scz_new2.tsv \ --snp_name snpid \ --out ./new \ --beta_name beta \ --a1_name a1 \ --a2_name a2 \ --p_name pval \ --chr_name chr \ --bpos_name bpos \ --n_name n \ --z_name z \ --se_name se\ --maf_min 0\ --n_min 0.0 \ --stream_stdout

Many thanks!

JonJala commented 2 years ago

Looks like this is breaking on the LD score regression step. If you pass your summary statistics into LDSC to calculate genetic correlation, do you see this same error?

CharlesLambert70 commented 2 years ago

The result of genetic correlation is below:

Call: ./ldsc.py \ --ref-ld-chr /mnt/dm001/raid5a/gyu/ldsc/eur_w_ld_chr/ \ --out diabetes_scz \ --rg /mnt/dm001/raid5a/gyu/summary/diabetes_new2.sumstats.gz,/mnt/dm001/raid5a/gyu/summary/scz_new2.sumstats.gz \ --w-ld-chr /mnt/dm001/raid5a/gyu/ldsc/eur_w_ld_chr/

Beginning analysis at Thu Sep 8 15:50:05 2022 Reading summary statistics from /mnt/dm001/raid5a/gyu/summary/diabetes_new2.sumstats.gz ... Read summary statistics for 1200113 SNPs. Reading reference panel LD Score from /mnt/dm001/raid5a/gyu/ldsc/eur_w_ld_chr/[1-22] ... (ldscore_fromlist) Read reference panel LD Scores for 1290028 SNPs. Removing partitioned LD Scores with zero variance. Reading regression weight LD Score from /mnt/dm001/raid5a/gyu/ldsc/eur_w_ld_chr/[1-22] ... (ldscore_fromlist) Read regression weight LD Scores for 1290028 SNPs. After merging with reference panel LD, 1177699 SNPs remain. After merging with regression SNP LD, 1177699 SNPs remain. Computing rg for phenotype 2/2 Reading summary statistics from /mnt/dm001/raid5a/gyu/summary/scz_new2.sumstats.gz ... Read summary statistics for 1217311 SNPs. After merging with summary statistics, 1177699 SNPs remain. 1177699 SNPs with valid alleles. /mnt/dm001/raid5a/gyu/ldsc/ldscore/sumstats.py:532: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead. ref_ld = sumstats.as_matrix(columns=ref_ld_cnames) /mnt/dm001/raid5a/gyu/ldsc/ldscore/irwls.py:161: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions. To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1. coef = np.linalg.lstsq(x, y)

Heritability of phenotype 1

Total Observed scale h2: 0.0441 (0.0028) Lambda GC: 1.2531 Mean Chi^2: 1.32 Intercept: 1.0339 (0.0098) Ratio: 0.106 (0.0306)

Heritability of phenotype 2/2

Total Observed scale h2: 0.0018 (0.0012) Lambda GC: 0.9986 Mean Chi^2: 1.009 Intercept: 0.9971 (0.0057) Ratio < 0 (usually indicates GC correction).

Genetic Covariance

Total Observed scale gencov: -4.4126e-05 (0.0011) Mean z1*z2: 0.0009 Intercept: 0.0021 (0.0047)

Genetic Correlation

Genetic Correlation: -0.005 (0.1277) Z-score: -0.0389 P: 0.969

Summary of Genetic Correlation Results p1 p2 rg se z p h2_obs h2_obs_se h2_int h2_int_se gcov_int gcov_int_se /mnt/dm001/raid5a/gyu/summary/diabetes_new2.sumstats.gz /mnt/dm001/raid5a/gyu/summary/scz_new2.sumstats.gz -0.005 0.1277 -0.0389 0.969 0.0018 0.0012 0.9971 0.0057 0.0021 0.0047

It seems that it works for genetic correlation in LDSC.

JonJala commented 2 years ago

Interesting, ok. I don't believe we've seen the error before, and from trying to do a little online research it sounds like it's an error bubbling up from farther down the stack (like in the MKL). There's probably a nan or inf in there somewhere that's introduced (or not filtered out properly) before it tries to run the regression. MTAG isn't doing exactly the same things as LDSC, so maybe that will help track down where that's happening, but it's not anything immediately apparent.

lhl-0419 commented 1 year ago

Hi, I have the same problem when running mtag. Have you solved it at this time? What is the cause of this problem? Thank you so much!