aertslab / create_cisTarget_databases

Create cisTarget databases
37 stars 8 forks source link

Problem running pySCENIC on custom made motifs #11

Closed terooatt closed 1 year ago

terooatt commented 2 years ago

Hi,

After I created a new motif list and corresponding feather files I ran the script bellow. It ran well until the "pyscenic ctx" part. From the error I believe there is a feature incompatibility between the .tbl file with the reg.csv and loom file but after modifying the features from the .tbl file to match, it is still not working so now I am not sure. Can you point where the error is?

Thanks

script.sh

import os import numpy as np import pandas as pd import loompy as lp import json import base64 import zlib from pyscenic.plotting import plot_binarization from pyscenic.export import add_scenic_metadata from pyscenic.cli.utils import load_signatures

wdir = "/home/XXX/pySENIC_projects/Aging_all_data_tCRE/" os.chdir( wdir ) f_loom_path_unfilt = "/home/XXX/pySENIC_projects/Aging_all_data_tCRE" # test dataset, n=500 cells f_loom_path_scenic = "/home/XXX/pySENIC_projects/Aging_all_data_tCRE/integrated_data.loom" f_pyscenic_output = "/home/XXX/pySENIC_projects/Aging_all_data_tCRE/adata_pyscenic_output.loom" f_final_loom = '/home/XXX/pySENIC_projects/Aging_all_data_tCRE/adata_integrated-output.loom'

import glob f_db_glob = "/home/XXX/pySENIC_projects/1_pysencic_data/feather_files/mouse_tCRE/*.feather" f_db_names = ' '.join( glob.glob(f_db_glob) )

f_motif_path = "/home/XXX/pySENIC_projects/1_pysencic_data/motif/mouse_tCRE/*.tbl"

%run /home/XXX/pySENIC_projects/1_pysencic_data/scripts/arboreto_with_multiprocessing.py \ /home/XXX/pySENIC_projects/Aging_all_data_tCRE/integrated_data.loom \ /home/XXX/pySENIC_projects/1_pysencic_data/transcription_factor_list/tCRE_mm_tfs.txt \ --method grnboost2 \ --output /home/XXX/pySENIC_projects/Aging_all_data_tCRE/adata_filtered_scenic.tsv \ --num_workers 4 \ --seed 777

!/home/XXX/miniconda3/envs/pyscenic/bin/pyscenic ctx adata_filtered_scenic.tsv \ {f_db_names} \ --annotations_fname {f_motif_path} \ --expression_mtx_fname {f_loom_path_scenic} \ --output reg.csv \ --mask_dropouts \ --num_workers 20

!/home/XXX/miniconda3/envs/pyscenic/bin/pyscenic aucell \ integrated_data.loom \ reg.csv \ --output pyscenic_output.loom \ --num_workers 20

Runing the script

XXX@dgt-gpu2:~/pySENIC_projects/Aging_all_data_tCRE$ ./script.sh Python 3.7.3 (default, Mar 27 2019, 22:11:17) Type 'copyright', 'credits' or 'license' for more information IPython 7.7.0 -- An enhanced Interactive Python. Type '?' for help. /home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/dask/config.py:161: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. data = yaml.load(f.read()) or {} Loaded expression matrix of 1000 cells and 7425 genes in 0.12579822540283203 seconds... Loaded 660 TFs... /home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/arboreto/algo.py:214: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead. expression_matrix = expression_data.as_matrix() starting grnboost2 using 4 processes... 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7425/7425 [01:38<00:00, 75.69it/s] Done in 99.694340467453 seconds. /home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/dask/config.py:161: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. data = yaml.load(f.read()) or {}

2021-10-20 11:31:17,971 - pyscenic.cli.pyscenic - INFO - Creating modules.

2021-10-20 11:31:18,148 - pyscenic.cli.pyscenic - INFO - Loading expression matrix.

2021-10-20 11:31:18,358 - pyscenic.utils - INFO - Calculating Pearson correlations.

2021-10-20 11:31:18,359 - pyscenic.utils - WARNING - Note on correlation calculation: the default behaviour for calculating the correlations has changed after pySCENIC verion 0.9.16. Previously, the default was to calculate the correlation between a TF and target gene using only cells with non-zero expression values (mask_dropouts=True). The current default is now to use all cells to match the behavior of the R verision of SCENIC. The original settings can be retained by setting 'rho_mask_dropouts=True' in the modules_from_adjacencies function, or '--mask_dropouts' from the CLI. Dropout masking is currently set to [True]. /home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py:138: RuntimeWarning: invalid value encountered in greater regulations = (rhos > rho_threshold).astype(int) - (rhos < -rho_threshold).astype(int) /home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pyscenic/utils.py:138: RuntimeWarning: invalid value encountered in less regulations = (rhos > rho_threshold).astype(int) - (rhos < -rho_threshold).astype(int)

2021-10-20 11:31:19,071 - pyscenic.utils - INFO - Creating modules.

2021-10-20 11:31:52,428 - pyscenic.cli.pyscenic - INFO - Loading databases.

2021-10-20 11:31:52,428 - pyscenic.cli.pyscenic - INFO - Calculating regulons. [ ] | 0% Completed | 8.5s Traceback (most recent call last): File "/home/XXX/miniconda3/envs/pyscenic/bin/pyscenic", line 10, in sys.exit(main()) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 420, in main args.func(args) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 165, in prune_targets_command num_workers=args.num_workers) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pyscenic/prune.py", line 351, in prune2df num_workers, module_chunksize) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pyscenic/prune.py", line 300, in _distributed_calc return create_graph().compute(scheduler='processes', num_workers=num_workers if num_workers else cpu_count()) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/dask/base.py", line 156, in compute (result,) = compute(self, traverse=False, kwargs) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/dask/base.py", line 397, in compute results = schedule(dsk, keys, kwargs) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/dask/multiprocessing.py", line 192, in get raise_exception=reraise, *kwargs) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/dask/local.py", line 501, in get_async raise_exception(exc, tb) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/dask/compatibility.py", line 111, in reraise raise exc.with_traceback(tb) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/dask/local.py", line 272, in execute_task result = _execute_task(task, data) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/dask/local.py", line 252, in _execute_task args2 = [_execute_task(a, cache) for a in args] File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/dask/local.py", line 252, in args2 = [_execute_task(a, cache) for a in args] File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/dask/local.py", line 253, in _execute_task return func(args2) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in modules2df for module in modules]) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in for module in modules]) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pyscenic/transform.py", line 185, in module2df weighted_recovery=weighted_recovery) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pyscenic/transform.py", line 123, in module2features_auc1st_impl df = db.load(module) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pyscenic/rnkdb.py", line 259, in load return FeatherReader(self._fname).read_pandas(columns=(INDEX_NAME,) + gs.genes).set_index(INDEX_NAME) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/frame.py", line 3909, in set_index level = frame[col]._values File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/frame.py", line 2688, in getitem return self._getitem_column(key) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/frame.py", line 2695, in _getitem_column return self._get_item_cache(key) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache values = self._data.get(item) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/internals.py", line 4115, in get loc = self.items.get_loc(item) File "/home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'features' /home/XXX/miniconda3/envs/pyscenic/lib/python3.7/site-packages/dask/config.py:161: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. data = yaml.load(f.read()) or {}

2021-10-20 11:32:06,965 - pyscenic.cli.pyscenic - INFO - Loading expression matrix.

2021-10-20 11:32:07,087 - pyscenic.cli.pyscenic - INFO - Loading gene signatures.

2021-10-20 11:32:07,090 - pyscenic.cli.pyscenic - ERROR - No columns to parse from file

ghuls commented 2 years ago

What is the output from?

ls -l /home/XXX/pySENIC_projects/1_pysencic_data/feather_files/mouse_tCRE/*.feather

It should only contain gene_vs_motifs.rankings.feather files.

Your glob pattern probably should be this.

f_db_glob = "/home/XXX/pySENIC_projects/1_pysencic_data/feather_files/mouse_tCRE/*.genes_vs_motifs.rankings.feather"

If it still gives the same error, you might not have a recent version of pySCENIC (as the new databases will have motifs or tracks instead of features as column name). Recent versions of pySCENIC handle this properly.

ghuls commented 1 year ago

Stale.