aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
163 stars 27 forks source link

After running GSEA. I got stuck on getting the regulons. #288

Open rogercasalsfr opened 5 months ago

rogercasalsfr commented 5 months ago

Describe the bug Hi everyone, I am currently running SCENIC+ on an HPC machine without internet access. I encountered an issue during GSEA calculations, which I believe could be resolved with additional RAM. Once I address this, I plan to proceed to the next step, building the gene regulatory network (GRN). I have attempted two different approaches to construct the GRN, encountering distinct issues with each. I suspect these issues may be related to either the data or the programs. I am reaching out to inquire whether anyone has encountered similar challenges and if they might be able to provide insights into potential solutions. Your assistance would be greatly appreciated. Thank you.

To Reproduce I use the same code lines as the SCENIC+ tutorials

Error output I have this error when I use run_scenicplus

  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3800, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'GEX_celltype'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "model.py", line 29, in <module>
    run_scenicplus(
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/scenicplus/wrappers/run_scenicplus.py", line 253, in run_scenicplus
    generate_pseudobulks(scplus_obj,
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/scenicplus/cistromes.py", line 215, in generate_pseudobulks
    categories = list(set(cell_data.loc[:, variable]))
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/pandas/core/indexing.py", line 1068, in __getitem__
    return self._getitem_tuple(key)
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/pandas/core/indexing.py", line 1248, in _getitem_tuple
    return self._getitem_lowerdim(tup)
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/pandas/core/indexing.py", line 968, in _getitem_lowerdim
    section = self._getitem_axis(key, axis=i)
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/pandas/core/indexing.py", line 1313, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/pandas/core/indexing.py", line 1261, in _get_label
    return self.obj.xs(label, axis=axis)
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/pandas/core/generic.py", line 4042, in xs
    return self[key]
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/pandas/core/frame.py", line 3805, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
    raise KeyError(key) from err
KeyError: 'GEX_celltype'

And this error if I try to use from scenicplus.grn_builder.gsea_approach import build_grn

Traceback (most recent call last):
  File "inferandgrn.py", line 63, in <module>
    pickle.dump(scplus_obj, f)
_pickle.PicklingError: Can't pickle <class 'pandas.core.frame.r2g'>: attribute lookup r2g on pandas.core.frame failed

Expected behavior It should get the scenicplus object result without any problem

Screenshots

Version (please complete the following information): Scenicplus version --> 1.0.1.dev4+ge4bdd9f Python version --> Python 3.8.8

Additional context

SeppeDeWinter commented 5 months ago

Hi @rogercasalsfr

For the error related to KeyError: 'GEX_celltype', what command were you running? I Would suspect that you don't have this field (GEX_celltype) in your cell metadata.

Related to the other error you are facing, and also the high amount of memory requirements you are mentioning. I would suggest to use the development branch of the code. This code is a lot more efficient. See: https://github.com/aertslab/scenicplus/discussions/202 for more information.

I hope this helps?

All the best.

Seppe

rogercasalsfr commented 5 months ago

Yes, I found both errors. The first one the key was 'GEX_cell_type' and related to the second one, I had to use dill in order to save the scplus_obj. Now I confront new errors, in the run_scenicplus. I improved the machine, it has 125GB of RAM. So the memory shouldn't be a problem.

/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/umap/umap_.py:1943: UserWarning: n_jobs value -1 overridden to 1 by setting random_state. Use no seed for parallelism.
  warn(f"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.")
Traceback (most recent call last):
  File "model.py", line 28, in <module>
    run_scenicplus(
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/scenicplus/wrappers/run_scenicplus.py", line 290, in run_scenicplus
    run_eRegulons_umap(scplus_obj,
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/scenicplus/dimensionality_reduction.py", line 294, in run_eRegulons_umap
    embedding = reducer.fit_transform(data_mat)
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/umap/umap_.py", line 2887, in fit_transform
    self.fit(X, y, force_all_finite)
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/umap/umap_.py", line 2608, in fit
    ) = nearest_neighbors(
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/umap/umap_.py", line 329, in nearest_neighbors
    knn_search_index = NNDescent(
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/pynndescent/pynndescent_.py", line 931, in __init__
    self._neighbor_graph = nn_descent(
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
    error_rewrite(e, 'typing')
  File "/apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
    raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
^[[1m^[[1m^[[1m^[[1mFailed in nopython mode pipeline (step: nopython frontend)
^[[1mUntyped global name 'print':^[[0m ^[[1m^[[1mCannot determine Numba type of <class 'function'>^[[0m
^[[1m
File "../../../../apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/pynndescent/pynndescent_.py", line 252:^[[0m
^[[1mdef nn_descent_internal_low_memory_parallel(
    <source elided>
        if verbose:
^[[1m            print("\t", n + 1, " / ", n_iters)
^[[0m            ^[[1m^^[[0m^[[0m
^[[0m
^[[0m^[[1mDuring: resolving callee type: type(CPUDispatcher(<function nn_descent_internal_low_memory_parallel at 0x2ad3cf588790>))^[[0m
^[[0m^[[1mDuring: typing of call at /apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/pynndescent/pynndescent_.py (358)
^[[0m
^[[0m^[[1mDuring: resolving callee type: type(CPUDispatcher(<function nn_descent_internal_low_memory_parallel at 0x2ad3cf588790>))^[[0m
^[[0m^[[1mDuring: typing of call at /apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/pynndescent/pynndescent_.py (358)
^[[0m
^[[1m
File "../../../../apps/PYTHON/environments/scenicplus_env/lib/python3.8/site-packages/pynndescent/pynndescent_.py", line 358:^[[0m
^[[1mdef nn_descent(
    <source elided>
    if low_memory:
^[[1m        nn_descent_internal_low_memory_parallel(
^[[0m        ^[[1m^^[[0m^[[0m

On the other hand, when I built the GRN, I am following the human example, and when I run this code:

from scenicplus.utils import format_egrns
format_egrns(scplus_obj, eregulons_key = 'eRegulons_importance', TF2G_key = 'TF2G_adj', key_added = 'eRegulon_metadata')
scplus_obj.uns['eRegulon_metadata']
    Region_signature_name   Gene_signature_name     TF  is_extended     Region  Gene    R2G_importance  R2G_rho     R2G_importance_x_rho    R2G_importance_x_abs_rho    TF2G_importance     TF2G_regulation     TF2G_rho    TF2G_importance_x_abs_rho   TF2G_importance_x_rho
0   AHR_+_+_(29r)   AHR_+_+_(27g)   AHR     False   chr13:42274796-42275705     DGKH    0.064142    0.173645    0.011138    0.011138    0.352030    1   0.154350    0.054336    0.054336
1   AHR_+_+_(29r)   AHR_+_+_(27g)   AHR     False   chr17:5031256-5032159   CAMTA2  0.027636    0.059174    0.001635    0.001635    1.160343    1   0.079704    0.092485    0.092485
2   AHR_+_+_(29r)   AHR_+_+_(27g)   AHR     False   chr1:155194395-155195304    ASH1L   0.059755    0.071645    0.004281    0.004281    0.323727    1   0.064882    0.021004    0.021004
3   AHR_+_+_(29r)   AHR_+_+_(27g)   AHR     False   chr13:100088665-100089516   PCCA    0.034391    0.096134    0.003306    0.003306    0.814791    1   0.096744    0.078826    0.078826
4   AHR_+_+_(29r)   AHR_+_+_(27g)   AHR     False   chr9:128218431-128219346    PTGES2  0.027092    0.088072    0.002386    0.002386    0.611197    1   0.069147    0.042263    0.042263
...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...
58  ZEB1_extended_-_-_(61r)     ZEB1_extended_-_-_(47g)     ZEB1    True    chr10:69222259-69223078     SRGN    0.026719    -0.072535   -0.001938   0.001938    0.404137    -1  -0.126949   -0.051305   0.051305
59  ZEB1_extended_-_-_(61r)     ZEB1_extended_-_-_(47g)     ZEB1    True    chr10:69232481-69233394     SRGN    0.009769    -0.237486   -0.002320   0.002320    0.404137    -1  -0.126949   -0.051305   0.051305
60  ZEB1_extended_-_-_(61r)     ZEB1_extended_-_-_(47g)     ZEB1    True    chr20:4811514-4812416   SLC23A2     0.005776    -0.327804   -0.001894   0.001894    0.151993    -1  -0.154505   -0.023484   0.023484
61  ZEB1_extended_-_-_(61r)     ZEB1_extended_-_-_(47g)     ZEB1    True    chr19:49500592-49501473     FCGRT   0.075699    -0.455435   -0.034476   0.034476    0.155985    -1  -0.236110   -0.036830   0.036830
62  ZEB1_extended_-_-_(61r)     ZEB1_extended_-_-_(47g)     ZEB1    True    chr10:128095322-128096261   PTPRE   0.004579    -0.243811   -0.001116   0.001116

I got it correctly.

But when I try to perform the following function it gives me an error:

 #Format eRegulons
from scenicplus.eregulon_enrichment import *
get_eRegulons_as_signatures(scplus_obj, eRegulon_metadata_key ='eRegulon_metadata', key_added='eRegulon_signatures')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[39], line 3
      1 #Format eRegulons
      2 from scenicplus.eregulon_enrichment import *
----> 3 get_eRegulons_as_signatures(scplus_obj, eRegulon_metadata_key ='eRegulon_metadata', key_added='eRegulon_signatures')

TypeError: get_eRegulons_as_signatures() got an unexpected keyword argument 'eRegulon_metadata_key'

But I've already added the key...

What is happening?

Thank you so much.

Roger

rogercasalsfr commented 3 months ago

Hi, I'm still stuck, has anybody find a solution to the problems posted? Thank you!

SeppeDeWinter commented 3 months ago

Hi @rogercasalsfr

For your first error, please see: https://github.com/aertslab/scenicplus/issues/203. I was still not able to replicate this issue (it has something to do with numa and UMAP incompatibilities). Again, this issue is resolved in the development branch of the code so I would highly recommend to use that code. That code will also become the default code very soon!

There are now also tutorials on how to use this branch: https://scenicplus.readthedocs.io/en/development/.

related to your second question, what version of SCENIC+ are you using to run this code?


import scenicplus
scenicplus.__version__

All the best,

Seppe