broadinstitute / tensorqtl

Ultrafast GPU-enabled QTL mapper
BSD 3-Clause "New" or "Revised" License
161 stars 52 forks source link

KeyError -1 writing permutation output #53

Closed ddpinto closed 1 year ago

ddpinto commented 2 years ago

Hello, With an sQTL dataset of ~260k phenotypes and ~8M variants we are running into convergence issues during calculations of cis-permutations. The permutations appear to run and yield a count of QTL phenotypes @ FDR 0.05 in the log files, but no final output is written. The logs contain ~5k messages of "WARNING: excluding # monomorphic variants" and about 14k warnings of "WARNING: scipy.optimize.newton failed to converge (running scipy.optimize.minimize)". The job dies with the following error message. Any ideas how to resolve this issue?

/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/tensorqtl/genotypeio.py:145: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  self.bed = self.bed[:,ix]
/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/tensorqtl/core.py:234: RuntimeWarning: invalid value encountered in sqrt
  return 2*stats.t.cdf(-np.abs(np.sqrt(tstat2)), dof)
  Time elapsed: 457.84 min
done.
  * writing output
Computing q-values
  * Number of phenotypes tested: 269872
  * Correlation between Beta-approximated and empirical p-values: : 0.9999
  * Proportion of significant phenotypes (1-pi0): 0.25
  * QTL phenotypes @ FDR 0.05: 10306
Traceback (most recent call last):
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: -1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/bin/tensorqtl", line 11, in <module>
    sys.exit(runpy.run_module('tensorqtl', {}, "__main__"))
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/runpy.py", line 210, in run_module
    return _run_code(code, {}, init_globals, run_name, mod_spec)
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/tensorqtl/__main__.py", line 2, in <module>
    tensorqtl.main()
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/tensorqtl/tensorqtl.py", line 114, in main
    calculate_qvalues(res_df, fdr=args.fdr, qvalue_lambda=args.qvalue_lambda, logger=logger)
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/tensorqtl/post.py", line 53, in calculate_qvalues
    lb = lb[-1]
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/pandas/core/series.py", line 942, in __getitem__
    return self._get_value(key)
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/pandas/core/series.py", line 1051, in _get_value
    loc = self.index.get_loc(label)
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: -1
francois-a commented 2 years ago

Hi, Which version of tensorQTL are you running? Can you try running the version from the master branch? This should in principle fix the error, which is unrelated to the warnings.

Since you're getting many warnings related to monomorphic variants, you should filter your VCF before running QTL mapping (depending on sample size, inclusion of variants with MAF < 0.01 will likely result in artifacts), or apply in-sample MAF filtering with the maf_threshold option.

ddpinto commented 2 years ago

Hi, After updating to the latest version in the master branch, I'm now seeing the following error in the permutation analysis:

Exception in thread Thread-2:
Traceback (most recent call last):
  File "/hpc/packages/minerva-centos7/anaconda3/2021.5/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/hpc/packages/minerva-centos7/anaconda3/2021.5/lib/python3.8/site-packages/tensorqtl/genotypeio.py", line 40, in run
    for item in self.generator:
  File "/hpc/packages/minerva-centos7/anaconda3/2021.5/lib/python3.8/site-packages/tensorqtl/genotypeio.py", line 451, in generate_data
    assert np.all([self.cis_ranges[g.index[0]][0] == self.cis_ranges[i][0] and self.cis_ranges[g.index[0]][1] == self.cis_ranges[i][1] for i in g.index[1:]])
AssertionError

Any ideas what may be causing this error?

ddpinto commented 2 years ago

Hello, After some more troubleshooting I solved the KeyError problem on my end. Instead of subsetting directly on the pd dataframe it had to be converted to a numpy array first. The following patch worked for me:

index bc1b062..133f640 100644
--- a/tensorqtl/post.py
+++ b/tensorqtl/post.py
@@ -51,9 +51,9 @@ def calculate_qvalues(res_df, fdr=0.05, qvalue_lambda=None, logger=None):
     ub = res_df.loc[res_df['qval']>fdr, 'pval_beta'].sort_values()

     if lb.shape[0] > 0:  # significant phenotypes
-        lb = lb[-1]
+        lb = lb.to_numpy()[-1]
         if ub.shape[0] > 0:
-            ub = ub[0]
+            ub = ub.to_numpy()[0]
             pthreshold = (lb+ub)/2
         else:
             pthreshold = lb
francois-a commented 1 year ago

I wasn't able to reproduce this. Please specify versions if this is still an issue.