aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
186 stars 29 forks source link

None of [Index([dtype='object', length=)] are in the [columns] Scenicplus snakemake cistopic object #355

Closed Baudicm closed 7 months ago

Baudicm commented 7 months ago

Hi, Thank you for this very useful tool, I would like to run again my data with the new optimized version using snakemake but I have an error related to the index of my cistopic_obj that I don't really understand :

Do you know why it doesn't find the indexes in the cystopic_obj while it find index in common between Anndata and cystopic object ?

Thank you very much for your help,

Manon

Assuming unrestricted shared filesystem usage for local execution. Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Job stats: job count


AUCell_direct 1 AUCell_extended 1 all 1 eGRN_direct 1 eGRN_extended 1 get_search_space 1 prepare_GEX_ACC_multiome 1 prepare_menr 1 region_to_gene 1 scplus_mudata 1 tf_to_gene 1 total 11

Select jobs to execute... Execute 1 jobs...

[Fri Apr 12 10:39:08 2024] localrule prepare_GEX_ACC_multiome: input: /data/GSRunit/Manon/Multiome/scenicplus/cistopic_obj35model.pkl, /data/GSRunit/Manon/Multiome/scenicplus/adata_all_conditions.h5ad output: /data/GSRunit/Manon/Multiome/scenicplus/outs/ACC_GEX.h5mu jobid: 2 reason: Missing output files: /data/GSRunit/Manon/Multiome/scenicplus/outs/ACC_GEX.h5mu resources: tmpdir=/tmp

2024-04-12 10:39:54,195 SCENIC+ INFO Reading cisTopic object. 2024-04-12 10:39:57,654 rpy2.situation INFO cffi mode is CFFI_MODE.ANY 2024-04-12 10:39:57,727 rpy2.situation INFO R home found: /usr/local/apps/R/4.3/4.3.2/lib64/R 2024-04-12 10:39:57,907 rpy2.situation INFO R library path: /usr/local/apps/R/4.3/4.3.2/lib64/R/lib:/usr/local/java/jdk-18.0.1.1/lib/server:/usr/local/libtiff/4.3.0-gcc-8.5.0/lib:/usr/local/intel/2022.1.2.146/mkl/2022.0.2/lib/intel64:/usr/local/pcre2/10.40/gcc-11.3.0/lib:/usr/local/apps/PMIx/pmix-3.2.3/lib:/usr/local/OpenMPI/4.1.3/ucx-1.10.1/gcc-11.3.0/lib/openmpi:/usr/local/OpenMPI/4.1.3/ucx-1.10.1/gcc-11.3.0/lib:/usr/local/GCC/11.3.0/lib/gcc/x86_64-redhat-linux/11.3.0/plugin:/usr/local/GCC/11.3.0/lib/gcc/x86_64-redhat-linux/11.3.0:/usr/local/GCC/11.3.0/lib64:/usr/local/GCC/11.3.0/lib:/usr/local/netcdf/4.9.0/gcc-11.3.0/lib:/usr/local/HDF5/1.12.2/lib 2024-04-12 10:39:57,907 rpy2.situation INFO LD_LIBRARY_PATH: /usr/local/apps/R/4.3/4.3.2/lib64/R/lib:/usr/local/java/jdk-18.0.1.1/lib/server:/usr/local/libtiff/4.3.0-gcc-8.5.0/lib:/usr/local/intel/2022.1.2.146/mkl/2022.0.2/lib/intel64:/usr/local/pcre2/10.40/gcc-11.3.0/lib:/usr/local/apps/PMIx/pmix-3.2.3/lib:/usr/local/OpenMPI/4.1.3/ucx-1.10.1/gcc-11.3.0/lib/openmpi:/usr/local/OpenMPI/4.1.3/ucx-1.10.1/gcc-11.3.0/lib:/usr/local/GCC/11.3.0/lib/gcc/x86_64-redhat-linux/11.3.0/plugin:/usr/local/GCC/11.3.0/lib/gcc/x86_64-redhat-linux/11.3.0:/usr/local/GCC/11.3.0/lib64:/usr/local/GCC/11.3.0/lib:/usr/local/netcdf/4.9.0/gcc-11.3.0/lib:/usr/local/HDF5/1.12.2/lib 2024-04-12 10:39:57,957 rpy2.rinterface_lib.embedded INFO Default options to initialize R: rpy2, --quiet, --no-save 2024-04-12 10:39:58,373 rpy2.rinterface_lib.embedded INFO R is already initialized. No need to initialize. 2024-04-12 10:40:05,535 SCENIC+ INFO Reading gene expression AnnData. 2024-04-12 10:40:08,555 Ingesting multiome data INFO Found 23958 multiome cells. Traceback (most recent call last): File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/bin/scenicplus", line 8, in sys.exit(main()) ^^^^^^ File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 1137, in main args.func(args) File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 44, in command_prepare_GEX_ACC prepare_GEX_ACC( File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 61, in prepare_GEX_ACC mdata = process_multiome_data( ^^^^^^^^^^^^^^^^^^^^^^ File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/scenicplus/data_wrangling/adata_cistopic_wrangling.py", line 44, in process_multiome_data imputed_acc_obj = impute_accessibility( ^^^^^^^^^^^^^^^^^^^^^ File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/pycisTopic/diff_features.py", line 374, in impute_accessibility cell_topic = model.cell_topic.loc[:, cell_names]


  File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/pandas/core/indexing.py", line 1068, in __getitem__
    return self._getitem_tuple(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/pandas/core/indexing.py", line 1257, in _getitem_tuple
    return self._getitem_tuple_same_dim(tup)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/pandas/core/indexing.py", line 925, in _getitem_tuple_same_dim
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/pandas/core/indexing.py", line 1302, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/pandas/core/indexing.py", line 1240, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/pandas/core/indexing.py", line 1433, in _get_listlike_indexer
    keyarr, indexer = ax._get_indexer_strict(key, axis_name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 6108, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "/gpfs/gsfs12/users/baudicm2/mambaforge/envs/scenicplusenv/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 6168, in _raise_if_missing
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
**KeyError: "None of [Index(['TACCGCAAGTGAGGGT-1_WT_AF', 'CTAAATGTCCCGTTGT-1_WT_AF',\n       'CCGTTTGGTCATCCTG-1_WT_AF', 'GAGTATCTCCTGGTGA-1_WT_AF',\n       'GTATTGATCCCGCCTA-1_WT_AF', 'GTCCTCCCAGGGAGGA-1_WT_AF',\n       'TCTCGCCCAAATACCT-1_WT_AF', 'GAGCTAGCAGGCTTGT-1_WT_AF',\n       'AATAGCTGTGCACGCA-1_WT_AF', 'TGAGCTTAGTAAGTGG-1_WT_AF',\n       ...\n       'ACACAATGTACTTAGG-1_Mut_Midbrain', 'GGATGAATCCGCATGA-1_Mut_Midbrain',\n       'CGCAAATTCCTAGTAA-1_Mut_Midbrain', 'CCTCAGTTCCACCTGT-1_Mut_Midbrain',\n       'GTCCGTAAGGTTACAC-1_Mut_Midbrain', 'ACTTAGGGTTGGTTCT-1_Mut_Midbrain',\n       'CCATAAATCTGGCAAT-1_Mut_Midbrain', 'CTATGAGGTAGCTGGT-1_Mut_Midbrain',\n       'GGCAAGCCATTAGGCC-1_Mut_Midbrain', 'TCCATCATCATGGCCA-1_Mut_Midbrain'],\n      dtype='object', length=27154)] are in the [columns]"**
2024-04-12 10:40:08,805 rpy2.rinterface_lib.embedded INFO     Embedded R ended.
2024-04-12 10:40:08,806 rpy2.rinterface_lib.embedded INFO     Embedded R already ended.
[Fri Apr 12 10:40:09 2024]
Error in rule prepare_GEX_ACC_multiome:
    jobid: 2
    input: /data/GSRunit/Manon/Multiome/scenicplus/cistopic_obj35model.pkl, /data/GSRunit/Manon/Multiome/scenicplus/adata_all_conditions.h5ad
    output: /data/GSRunit/Manon/Multiome/scenicplus/outs/ACC_GEX.h5mu
    shell:

            scenicplus prepare_data prepare_GEX_ACC                 --cisTopic_obj_fname /data/GSRunit/Manon/Multiome/scenicplus/cistopic_obj35model.pkl                 --GEX_anndata_fname /data/GSRunit/Manon/Multiome/scenicplus/adata_all_conditions.h5ad                 --out_file /data/GSRunit/Manon/Multiome/scenicplus/outs/ACC_GEX.h5mu                 --bc_transform_func "lambda x: f'{x}'"

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-04-12T103900.433973.snakemake.log
WorkflowError:
At least one job did not complete successfully.
ktroule commented 7 months ago

Hi.

I think the issue is due to the different naming of the barcodes of GEX and ATAC.

Baudicm commented 7 months ago

Hi, thank you for your reply. Yes but I don't understand because scenic found cells in common : it is written "2024-04-12 10:40:08,555 Ingesting multiome data INFO Found 23958 multiome cells" An when I run : len(set(cistopic_obj.cell_names) & set(adata.obs_names)), it also give me "23958"

I don't really understand what "[columns]" means for in the Snakemake error.

Thank you again for you help

SeppeDeWinter commented 7 months ago

Hi @Baudicm

Seems like the error is thrown while imputing accessibility.

Could you show?


cistopic_obj.selected_model.cell_topic

Best,

Seppe

Baudicm commented 7 months ago
TACCGCAAGTGAGGGT-1-WT_AF___WT_AF    CTAAATGTCCCGTTGT-1-WT_AF___WT_AF    CCGTTTGGTCATCCTG-1-WT_AF___WT_AF    GAGTATCTCCTGGTGA-1-WT_AF___WT_AF    GTATTGATCCCGCCTA-1-WT_AF___WT_AF    GTCCTCCCAGGGAGGA-1-WT_AF___WT_AF    TCTCGCCCAAATACCT-1-WT_AF___WT_AF    GAGCTAGCAGGCTTGT-1-WT_AF___WT_AF    AATAGCTGTGCACGCA-1-WT_AF___WT_AF    TGAGCTTAGTAAGTGG-1-WT_AF___WT_AF    ... ACACAATGTACTTAGG-1-Mut_Midbrain___Mut_Midbrain  GGATGAATCCGCATGA-1-Mut_Midbrain___Mut_Midbrain  CGCAAATTCCTAGTAA-1-Mut_Midbrain___Mut_Midbrain  CCTCAGTTCCACCTGT-1-Mut_Midbrain___Mut_Midbrain  GTCCGTAAGGTTACAC-1-Mut_Midbrain___Mut_Midbrain  ACTTAGGGTTGGTTCT-1-Mut_Midbrain___Mut_Midbrain  CCATAAATCTGGCAAT-1-Mut_Midbrain___Mut_Midbrain  CTATGAGGTAGCTGGT-1-Mut_Midbrain___Mut_Midbrain  GGCAAGCCATTAGGCC-1-Mut_Midbrain___Mut_Midbrain  TCCATCATCATGGCCA-1-Mut_Midbrain___Mut_Midbrain
Topic1  0.003502    0.022933    0.005403    0.009163    0.013929    0.011268    0.011032    0.025990    0.003867    0.010575    ... 0.010375    0.010649    0.009345    0.007629    0.001756    0.003436    0.014850    0.006405    0.017696    0.040800
Topic2  0.019162    0.001339    0.026155    0.002188    0.011959    0.010905    0.042464    0.009409    0.020540    0.014839    ... 0.003031    0.007922    0.008207    0.041007    0.004828    0.028875    0.019685    0.011724    0.010599    0.003583
Topic3  0.008268    0.004039    0.000606    0.000672    0.003095    0.003196    0.009868    0.008211    0.005353    0.002845    ... 0.003031    0.007013    0.033805    0.004291    0.012508    0.060790    0.024519    0.005645    0.002212    0.020573
Topic4  0.018027    0.007164    0.005849    0.002643    0.008020    0.047466    0.140640    0.029275    0.004693    0.020478    ... 0.013231    0.018831    0.008776    0.013637    0.006876    0.006211    0.007598    0.017803    0.004147    0.048081
Topic5  0.004863    0.083311    0.095328    0.109239    0.098302    0.058527    0.004823    0.076284    0.058836    0.099646    ... 0.005071    0.016558    0.006501    0.016307    0.001756    0.017775    0.010015    0.002605    0.002212    0.004392
Topic6  0.056838    0.021087    0.036643    0.001885    0.032642    0.041629    0.016852    0.057462    0.071051    0.063583    ... 0.025879    0.021558    0.013327    0.024986    0.016604    0.003436    0.077702    0.015523    0.017051    0.034327
Topic7  0.009403    0.014836    0.036420    0.016896 ... 

Yes correct, the first row doesnt show the same name than cistopic_obj.cell_names. Thank you very much

SeppeDeWinter commented 7 months ago

Ok, then something went wrong with your cistopic object. Not sure how this could have occured, but simply renaming the columns so they are in the same format as the .cell_names should fix your issue.

All the best,

Seppe

Baudicm commented 7 months ago

Yes, It fixed the issue. Thank you very much, Best Manon

SeppeDeWinter commented 7 months ago

No worries!

Good luck with the analysis!

Best,

Seppe