broadinstitute / tensorqtl

Ultrafast GPU-enabled QTL mapper
BSD 3-Clause "New" or "Revised" License
162 stars 52 forks source link

Error in Computing q-values: ValueError: array must not contain infs or NaNs #166

Open Alice9503 opened 1 week ago

Alice9503 commented 1 week ago

Hello TensorQTL team,

I am encountering an error while computing q-values using TensorQTL version 1.0.9 for cis-QTL. I have seen similar issues reported, and the recommendation was to use the latest version, which I am already using. For this test, I am only using data from chromosome 1. Below are the details of my setup and the error:

Setup:

  * 38937 samples
  * 338 phenotypes
  * 13 covariates
  * 508570 variants
  * applying in-sample 0.01 MAF filter
  * cis-window: ±1,000,000
  * checking phenotypes: 338/338

Command Used:

GENOTYPE_PREFIX="chr1_final_qc"
PHENOTYPE_FILE="samples_expression_1.bed.gz"
COVARIATES_FILE="samples_covariates.txt"
OUTPUT_DIR="Pan_1"

mkdir -p ${OUTPUT_DIR}

PREFIX="Pan_1_cis_QTL"
tensorqtl ${GENOTYPE_PREFIX} ${PHENOTYPE_FILE} ${PREFIX} \
    --covariates ${COVARIATES_FILE} \
    --mode cis \
    --permutations 10000 \
    --window 1000000 \
    --maf_threshold 0.01 \
    -o ${OUTPUT_DIR}

Error Encountered: During the computation of permutations, I receive repeated warnings:

WARNING: scipy.optimize.newton failed to converge (running scipy.optimize.minimize)

And ultimately, it results in the following error when calculating q-values:

Computing q-values
  * Number of phenotypes tested: 338
Traceback (most recent call last):
  File "/home/data/miniconda3/envs/tensorqtl/bin/tensorqtl", line 5, in <module>
    from tensorqtl import __main__
  File "/home/data/miniconda3/envs/tensorqtl/lib/python3.8/site-packages/tensorqtl/__main__.py", line 2, in <module>
    tensorqtl.main()
  File "/home/data/miniconda3/envs/tensorqtl/lib/python3.8/site-packages/tensorqtl/tensorqtl.py", line 172, in main
    calculate_qvalues(res_df, fdr=args.fdr, qvalue_lambda=args.qvalue_lambda, logger=logger)
  File "/home/data/miniconda3/envs/tensorqtl/lib/python3.8/site-packages/tensorqtl/post.py", line 38, in calculate_qvalues
    r = stats.pearsonr(res_df['pval_perm'], res_df['pval_beta'])[0]
  File "/home/data/miniconda3/envs/tensorqtl/lib/python3.8/site-packages/scipy/stats/_stats_py.py", line 4452, in pearsonr
    normym = linalg.norm(ym)
  File "/home/data/miniconda3/envs/tensorqtl/lib/python3.8/site-packages/scipy/linalg/_misc.py", line 146, in norm
    a = np.asarray_chkfinite(a)
  File "/home/data/miniconda3/envs/tensorqtl/lib/python3.8/site-packages/numpy/lib/function_base.py", line 628, in asarray_chkfinite
    raise ValueError(
ValueError: array must not contain infs or NaNs

Additional Information:

I have ensured that my phenotype and covariate data do not contain missing values. Genotype data has been quality controlled for missingness at a 10% threshold.

Package                  Version
------------------------ -----------
backports.zoneinfo       0.2.1
bx-python                0.13.0
cffi                     1.17.1
click                    8.1.7
cloudpickle              3.1.0
contourpy                1.1.1
cycler                   0.12.1
dask                     2023.5.0
Deprecated               1.2.14
exceptiongroup           1.2.2
filelock                 3.16.1
fonttools                4.54.1
fsspec                   2024.10.0
importlib_metadata       8.5.0
importlib_resources      6.4.5
iniconfig                2.0.0
Jinja2                   3.1.4
kiwisolver               1.4.7
locket                   1.0.0
MarkupSafe               2.1.3
matplotlib               3.7.5
mpmath                   1.3.0
networkx                 3.1
numpy                    1.24.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.6.77
nvidia-nvtx-cu12         12.1.105
packaging                24.1
pandas                   2.0.3
pandas-plink             2.2.9
partd                    1.4.1
Pgenlib                  0.91.0
pillow                   10.4.0
pip                      24.2
pluggy                   1.5.0
pyarrow                  17.0.0
pyBigWig                 0.3.22
pycparser                2.21
pyparsing                3.1.4
pytest                   8.3.3
python-dateutil          2.9.0.post0
pytz                     2024.1
PyYAML                   6.0.2
qtl                      0.1.8
rpy2                     3.5.11
scipy                    1.10.1
seaborn                  0.13.2
setuptools               75.1.0
simplegeneric            0.8.1
six                      1.16.0
sympy                    1.13.3
tensorqtl                1.0.9
tomli                    2.0.2
toolz                    1.0.0
torch                    2.4.1
tqdm                     4.66.6
triton                   3.0.0
typing_extensions        4.12.2
tzdata                   2024.2
tzlocal                  5.2
wheel                    0.44.0
wrapt                    1.16.0
xarray                   2023.1.0
zipp                     3.20.2
zstandard                0.23.0

If further information or data subsets are needed for troubleshooting, I can provide them. Any guidance or suggestions to resolve this issue would be greatly appreciated.

Thank you for your help!

francois-a commented 1 week ago

Hi, yes can you please provide a test dataset that reproduces this error?

Alice9503 commented 1 week ago

Hi,

I have sent a test dataset for chromosome 22 via email due to file size restrictions. The error occurs consistently with both chromosome 1 and chromosome 22 data, so I believe it is representative of the issue. Please let me know if you need additional information.

Thank you again for your assistance!

francois-a commented 1 week ago

I didn't receive anything — could you please resend to f...@b....org?

francois-a commented 5 days ago

Thanks for sending the test data. The problem isn't package versions (an environment setup script is available here), but rather an error from the R qvalue package that can occur when too few p-values are available to robustly estimate pi0. The solution is to fix the lambda parameter, using, e.g., --qvalue_lambda 0.85 (this is the threshold we used in GTEx, but you may need to adjust this — if most of the p-values are very small/pi1 is high, 0.5 or lower may work better). You can ignore the "WARNING: scipy.optimize.newton failed to converge" — when these occur an alternative solver is used, and these should disappear in v1.0.10.