Open tuanpham96 opened 1 month ago
Update: also tested with singularity
and had the same error
# build image & bind path
singularity build pyscenic.sif docker://aertslab/pyscenic_scanpy:0.12.1_1.9.1
export SINGULARITY_BINDPATH="/oscar/home/$USER,/oscar/scratch/$USER,/oscar/data" # this is from our HPC's guide for binding path
# create a shell inside
singularity shell utils/pyscenic.sif
Then inside the shell I just started an ipython
kernel, copied and pasted that same code. The same issues occurred.
Am I defining the right resources? There are some pages in the resources URL that are indicated as deprecated but I'm not entirely sure which ones to change them to.
Run the command line version and not the notebook version: https://pyscenic.readthedocs.io/en/latest/installation.html#docker-podman-and-singularity-apptainer-images
I'm using the singularity image with the CLI and it seems to be stuck at ctx
step for > 2 hrs without finishing. I'm using --mode "custom_multiprocessing" --num_workers 40
. Is that typical?
nevermind, based on reading other issues it seems to be I need more RAM and less number of cores. I did 20 cores + 200 gb and it seems to finish within 20 - 25 minutes using the singularity image with "dask_multiprocessing".
Is there a guide about suggested minimum RAM + # cores for each step, given some number of genes / cells / databases?
Description
I'm running the tutorial and I keep getting the errors at the
prune2df
step like this:Steps to reproduce the behavior
Import & Define resources
```python # import import os import sys import glob import re import numpy as np import pandas as pd from dask.diagnostics import ProgressBar from dask.distributed import Client, LocalCluster from arboreto.utils import load_tf_names from arboreto.algo import grnboost2, genie3 from ctxcore.rnkdb import FeatherRankingDatabase as RankingDatabase from pyscenic.utils import modules_from_adjacencies, load_motifs from pyscenic.prune import prune2df, df2regulons from pyscenic.aucell import aucell # define paths OUTPUT_DATA_FOLDER = "data/grn" INPUT_EXPR_FILE = 'data/external/geo/GSE60361_C1-3005-Expression.txt' RESOURCES_DIRECTORY = "data/external/aertslab/resources.aertslab.org/cistarget" DATABASES_GLOB = os.path.join( RESOURCES_DIRECTORY, "databases/mus_musculus/mm9/refseq_r45/mc9nr/gene_based/", "mm9-*.mc9nr.genes_vs_motifs.rankings.feather" ) MOTIF_ANNOTATIONS_FNAME = os.path.join( RESOURCES_DIRECTORY, "motif2tf/motifs-v9-nr.mgi-m0.001-o0.0.tbl" ) MM_TFS_FNAME = os.path.join( RESOURCES_DIRECTORY, "tf_lists/allTFs_mm.txt" ) REGULONS_FNAME = os.path.join(OUTPUT_DATA_FOLDER, "regulons.p") MOTIFS_FNAME = os.path.join(OUTPUT_DATA_FOLDER, "motifs.csv") ``` Here's what the resource directory looks like: ``` data/external/aertslab/resources.aertslab.org/cistarget ├── databases │ └── mus_musculus │ ├── mm10 │ │ ├── refseq_r80 │ │ │ ├── mc9nr │ │ │ │ └── gene_based │ │ │ └── mc_v10_clust │ │ │ └── gene_based │ │ └── screen │ │ └── mc_v10_clust │ │ └── region_based │ └── mm9 │ ├── refseq_r45 │ │ └── mc9nr │ │ └── gene_based │ │ ├── mm9-500bp-upstream-10species.mc9nr.genes_vs_motifs.rankings.feather │ │ ├── mm9-500bp-upstream-10species.mc9nr.genes_vs_motifs.rankings.feather.sha1sum.txt │ │ ├── mm9-500bp-upstream-7species.mc9nr.genes_vs_motifs.rankings.feather │ │ ├── mm9-500bp-upstream-7species.mc9nr.genes_vs_motifs.rankings.feather.sha1sum.txt │ │ ├── mm9-tss-centered-10kb-10species.mc9nr.genes_vs_motifs.rankings.feather │ │ ├── mm9-tss-centered-10kb-10species.mc9nr.genes_vs_motifs.rankings.feather.sha1sum.txt │ │ ├── mm9-tss-centered-10kb-7species.mc9nr.genes_vs_motifs.rankings.feather │ │ ├── mm9-tss-centered-10kb-7species.mc9nr.genes_vs_motifs.rankings.feather.sha1sum.txt │ │ ├── mm9-tss-centered-5kb-10species.mc9nr.genes_vs_motifs.rankings.feather │ │ ├── mm9-tss-centered-5kb-10species.mc9nr.genes_vs_motifs.rankings.feather.sha1sum.txt │ │ ├── mm9-tss-centered-5kb-7species.mc9nr.genes_vs_motifs.rankings.feather │ │ └── mm9-tss-centered-5kb-7species.mc9nr.genes_vs_motifs.rankings.feather.sha1sum.txt │ └── refseq_r70 │ └── mc9nr │ └── region_based ├── motif2tf │ ├── motifs-v10nr_clust-nr.chicken-m0.001-o0.0.tbl │ ├── motifs-v10nr_clust-nr.flybase-m0.001-o0.0.tbl │ ├── motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl │ ├── motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl │ ├── motifs-v8-nr.flybase-m0.001-o0.0.tbl │ ├── motifs-v9-nr.flybase-m0.001-o0.0.tbl │ ├── motifs-v9-nr.hgnc-m0.001-o0.0.tbl │ └── motifs-v9-nr.mgi-m0.001-o0.0.tbl └── tf_lists ├── allTFs_dmel.txt ├── allTFs_hg38.txt └── allTFs_mm.txt ```Load data
```python ex_matrix = pd.read_csv(INPUT_EXPR_FILE, sep='\t', header=0, index_col=0).T ex_matrix.shape --- (3005, 19972) ``` ```python tf_names = load_tf_names(MM_TFS_FNAME) len(tf_names) --- 1860 ``` ```python db_fnames = glob.glob(DATABASES_GLOB) def name(fname): return os.path.splitext(os.path.basename(fname))[0] dbs = [RankingDatabase(fname=fname, name=name(fname)) for fname in db_fnames] dbs --- [FeatherRankingDatabase(name="mm9-500bp-upstream-10species.mc9nr.genes_vs_motifs.rankings"), FeatherRankingDatabase(name="mm9-500bp-upstream-7species.mc9nr.genes_vs_motifs.rankings"), FeatherRankingDatabase(name="mm9-tss-centered-10kb-10species.mc9nr.genes_vs_motifs.rankings"), FeatherRankingDatabase(name="mm9-tss-centered-10kb-7species.mc9nr.genes_vs_motifs.rankings"), FeatherRankingDatabase(name="mm9-tss-centered-5kb-10species.mc9nr.genes_vs_motifs.rankings"), FeatherRankingDatabase(name="mm9-tss-centered-5kb-7species.mc9nr.genes_vs_motifs.rankings")] ```Then the steps as in the tutorials:
The above steps worked fine. Then to
prune2df
, which didn't work:Since I'm running on university HPC, I followed this comment:
Here's a snippet of the trace back:
Full traceback
```pytb /users/Please complete the following information:
pip git+...
pip git+...