getian107 / PRScs

Polygenic prediction via continuous shrinkage priors
MIT License
156 stars 58 forks source link

ValueError: array must not contain infs or NaNs #40

Closed daniel-hui closed 2 years ago

daniel-hui commented 2 years ago

Hi Tian,

Hope you enjoyed the holidays. I am having this error when trying to run PRScs on iter-900:

/project/ritchie/tools/PRScs/mcmc_gtb.py:55: RuntimeWarning: divide by zero encountered in true_divide
  dinvt = ld_blk[kk]+sp.diag(1.0/psi[idx_blk].T[0])
Traceback (most recent call last):
  File "/project/ritchie/tools/PRScs/PRScs.py", line 172, in <module>
    main()
  File "/project/ritchie/tools/PRScs/PRScs.py", line 165, in main
    mcmc_gtb.mcmc(param_dict['a'], param_dict['b'], param_dict['phi'], sst_dict, param_dict['n_gwas'], ld_blk, blk_size,
  File "/project/ritchie/tools/PRScs/mcmc_gtb.py", line 56, in mcmc
    dinvt_chol = linalg.cholesky(dinvt)
  File "/appl/python-3.8/lib/python3.8/site-packages/scipy/linalg/decomp_cholesky.py", line 88, in cholesky
    c, lower = _cholesky(a, lower=lower, overwrite_a=overwrite_a, clean=True,
  File "/appl/python-3.8/lib/python3.8/site-packages/scipy/linalg/decomp_cholesky.py", line 17, in _cholesky
    a1 = asarray_chkfinite(a) if check_finite else asarray(a)
  File "/appl/python-3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 488, in asarray_chkfinite
    raise ValueError(
ValueError: array must not contain infs or NaNs

I have run PRScs with other summary statistics and did not have this issue. I saw past issues where this was mentioned, I looked at the summary statistics file and still am not sure of the problem (I have now removed all SNPs with "NA" for any value, and SNPs with p-value of 1.0). Here is the full command:

python3 /project/ritchie/tools/PRScs/PRScs.py --ref_dir=../tools/PRScsx-master/ldblks/ldblk_ukbb_eur --chrom=22 --bim_prefix=prscsx_bim --sst_file=tmp.txt --n_gwas=420035 --seed=123456789 --out_dir=prscs/townsend_chr22

tmp.txt.gz

I just reformatted snpinfo_mult_ukbb_hm3 to a .bim file for the bim_prefix file. Would you happen to know what the issue is? I have attached the summary statistics file after gzip'ing (they are just Neale Lab European ancestry GWAS for Townsend Deprivation index for all SNPs in snpinfo_mult_ukbb_hm3). Thank you.

getian107 commented 2 years ago

Hi Daniel- I have never seen this issue after a large number of iterations; usually if the summary statistics are not appropriately formatted, errors would occur early in the MCMC iteration. Is this a recurrent issue or transient? If you rerun the model with a different seed and don't see this error, it might be a very rare numerical issue. I can try to reproduce the error on my end if the issue happens every time you run PRScs on this set of summary statistics.

daniel-hui commented 2 years ago

Thanks for getting back to me. It is a recurrent issue, I am not able to run PRScs using these summary statistics. I tried it without the "--seed=" option and changing the seed to "987654321", still have the same issue.

getian107 commented 2 years ago

Hi Daniel- I don't have any issue running PRScs on the summary stats you sent on chromosome 22. Did you always encounter the error after a few hundred MCMC iterations and always see the same error message? Would you try a different python version to see how it works?

daniel-hui commented 2 years ago

Hm I tried redownloading PRScs and it is working now. I was using a version from October, although I don't see any updates to the GitHub since 7 months ago -- someone in my group mentioned making some very small changes to the program, perhaps this was the issue. I think the issue can be closed. Thanks.