aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
186 stars 29 forks source link

Stuck at the AUCell_direct. (insufficient memory, with 1Tb of RAM) #498

Open JinKyu-Cheong opened 2 weeks ago

JinKyu-Cheong commented 2 weeks ago

Hi

I keep having issues with the resources. I was stuck at the region-to-gene step with 250k cells, then I subset the data to 160k cells and I were able to preceed up to eGRN analysis. Howevere then I'm stuck at the AUCell step. I don't have any error logs to share since the kernel is shut down once memory dump happens. Our facility allows maximum 1tb and 70 cores per request. I kept using 1tb RAM and tried different numbers or cores, but changing the cores didn't help.

What else can I do to make it work?

Thanks!

log message bellow

Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 2
Rules claiming more threads will be scaled down.
Job stats:
job                count
---------------  -------
AUCell_direct          1
AUCell_extended        1
all                    1
eGRN_extended          1
scplus_mudata          1
tf_to_gene             1
total                  6

Select jobs to execute...
Execute 1 jobs...

[Fri Nov  1 09:33:29 2024]
localrule AUCell_direct:
    input: /data/niecr/cheongj/ibd/scenicplus/outs/eRegulon_direct.tsv, /data/niecr/cheongj/ibd/scenicplus/outs/ACC_GEX.h5mu
    output: /data/niecr/cheongj/ibd/scenicplus/outs/AUCell_direct.h5mu
    jobid: 2
    reason: Missing output files: /data/niecr/cheongj/ibd/scenicplus/outs/AUCell_direct.h5mu
    threads: 2
    resources: tmpdir=/scratch/lsftmp/10293873.tmpdir

2024-11-01 09:34:13,492 SCENIC+      INFO     Reading data.
/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
  warnings.warn(
/data/niecr/cheongj/miniconda3/envs/scenicplus/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
  warnings.warn(
2024-11-01 09:38:06,884 SCENIC+      INFO     Calculating enrichment scores.
samuelheczko commented 5 days ago

Hi! I just wanted to point out that I have the same problem. running out of memory at the "Calculating enrichment scores" part in the AUCell (extended for me tho) - I ran it with 1.9TB ram but that wasn't enough. On the report I got from the cluster it says that it tried to take 2.5 TB ram. I don't think that can be right I and there must be some bug in my files or something. Did you manage to figure this one out @JinKyu-Cheong ?

Thanks !

SeppeDeWinter commented 4 days ago

Hi @JinKyu-Cheong and @samuelheczko

This step can take some memory, but >2 TB seems excessive. How many genes and regions do you have?

All the best,

Seppe

samuelheczko commented 3 days ago

Hi, Thanks, @SeppeDeWinter, for the answer! I managed to get through the phase by running the Snakemake steps individually in an interactive HCP session (as opposed to submitting a job) with:

snakemake -R --until --cores 10 I also ran the steps in a slightly different order.

When I ran Snakemake without specifying the step, it executed eGRN_extended and immediately attempted AUCell_extended, where my workflow was interrupted. Instead, I first ran eGRN_direct using the command above and then executed both AUCell commands, followed by scplus_mudata. I allocated 10 cores with 32G each.

I’m not entirely sure why this worked, but perhaps someone else will find it useful as well!

Best, Sam