broadinstitute / tensorqtl

Ultrafast GPU-enabled QTL mapper
BSD 3-Clause "New" or "Revised" License
162 stars 52 forks source link

Warnings and Errors When Running TensorQTL Example in Jupyter Notebook #167

Open Alice9503 opened 1 week ago

Alice9503 commented 1 week ago

Description: While running the official TensorQTL example dataset in Jupyter Notebook, I encountered several warnings and errors that seem to impact the successful execution of the example:

  1. Warning: "Warning: 'rfunc' cannot be imported. R with the 'qvalue' library, and the 'rpy2' Python package are needed."

  2. Frequent Warnings during Permutation Computation:

  return 2*stats.t.cdf(-np.abs(np.sqrt(tstat2)), dof)

    processing phenotype 28/301WARNING: scipy.optimize.newton failed to converge (running scipy.optimize.minimize)
  1. Error During Q-value Calculation:
  * Number of phenotypes tested: 301

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 10
      5 cis_df = cis.map_cis(genotype_df, variant_df, 
      6                      phenotype_df.loc[phenotype_pos_df['chr'] == 'chr18'],
      7                      phenotype_pos_df.loc[phenotype_pos_df['chr'] == 'chr18'],
      8                      covariates_df=covariates_df, seed=123456)
      9 # compute q-values (in practice, this must be run on all genes, not a subset)
---> 10 post.calculate_qvalues(cis_df, fdr=0.05, qvalue_lambda=0.85)

File ~/miniconda3/envs/tensorqtl/lib/python3.8/site-packages/tensorqtl/post.py:38, in calculate_qvalues(res_df, fdr, qvalue_lambda, logger)
     36 if not res_df['pval_beta'].isnull().all():
     37     pval_col = 'pval_beta'
---> 38     r = stats.pearsonr(res_df['pval_perm'], res_df['pval_beta'])[0]
     39     logger.write(f'  * Correlation between Beta-approximated and empirical p-values: {r:.4f}')
     40 else:

File ~/miniconda3/envs/tensorqtl/lib/python3.8/site-packages/scipy/stats/_stats_py.py:4452, in pearsonr(x, y, alternative)
   4448 # Unlike np.linalg.norm or the expression sqrt((xm*xm).sum()),
   4449 # scipy.linalg.norm(xm) does not overflow if xm is, for example,
   4450 # [-5e210, 5e210, 3e200, -3e200]
   4451 normxm = linalg.norm(xm)
-> 4452 normym = linalg.norm(ym)
   4454 threshold = 1e-13
   4455 if normxm < threshold*abs(xmean) or normym < threshold*abs(ymean):
   4456     # If all the values in x (likewise y) are very close to the mean,
   4457     # the loss of precision that occurs in the subtraction xm = x - xmean
   4458     # might result in large errors in r.

File ~/miniconda3/envs/tensorqtl/lib/python3.8/site-packages/scipy/linalg/_misc.py:146, in norm(a, ord, axis, keepdims, check_finite)
    144 # Differs from numpy only in non-finite handling and the use of blas.
    145 if check_finite:
--> 146     a = np.asarray_chkfinite(a)
    147 else:
    148     a = np.asarray(a)

File ~/miniconda3/envs/tensorqtl/lib/python3.8/site-packages/numpy/lib/function_base.py:628, in asarray_chkfinite(a, dtype, order)
    626 a = asarray(a, dtype=dtype, order=order)
    627 if a.dtype.char in typecodes['AllFloat'] and not np.isfinite(a).all():
--> 628     raise ValueError(
    629         "array must not contain infs or NaNs")
    630 return a

ValueError: array must not contain infs or NaNs

Upon further inspection, I found NaN values in the cis_df DataFrame specifically in the beta_shape1, beta_shape2, and pval_beta columns, with the following summary:

beta_shape1       25
beta_shape2       25
pval_beta         25
  1. Hypothesis on Cause: The frequent warnings and convergence issues during permutation seem to introduce NaN values, leading to the final ValueError. As I'm using the provided example data, I suspect the issue may be due to a specific software/environment version discrepancy.

Request: Could you please provide a list of specific versions (Python, TensorQTL, dependencies, etc.) in a fully functional environment? I will then attempt to replicate the environment locally to test the example.

I've attached the full tensorqtl_examples.ipynb notebook with complete error output for reference.

Thank you for your assistance! tensorqtl_examples_used.zip

Alice9503 commented 1 week ago

I've actually already installed R with the 'qvalue' and 'rpy2' Python package by mamba install -y rpy2 bioconductor-qvalue before pip install tensorqtl

francois-a commented 4 days ago

I can't reproduce these issues with the example notebook. Please try again with v1.0.10.