aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
424 stars 179 forks source link

[issue] running pySCENIC on large datasets #580

Open li-xuyang28 opened 2 days ago

li-xuyang28 commented 2 days ago

I am running pySCENIC using the singularity container with scipy (aertslab-pyscenic-scanpy-0.12.1-1.9.1.sif) on a decently large dataset on a HPC, with a 150G memory and 40 cores allocation (salloc -J interact -N 1-1 -n 40 --mem=150G --time=2:00:00 -p parallel srun --pty bash). I had been able to create meta cells and run the pipeline, however would still like to examine the results with the original sc data if possible. I ran into the following issue with the command shown:

arboreto_with_multiprocessing.py \
    /home/xli324/data-kkinzle1/xli324/scRNAseq/Chetan/filtered.loom \
    /home/xli324/data-kkinzle1/xli324/resources/allTFs_hg38.txt  \
    --method grnboost2 \
    --output /home/xli324/data-kkinzle1/xli324/scRNAseq/Chetan/adj.tsv \
    --num_workers 40 \
    --seed 777
Loaded expression matrix of 230586 cells and 15431 genes in 117.41096949577332 seconds...
Loaded 1892 TFs...
starting grnboost2 using 40 processes...
  0%|                                                                                                                                                                             | 0/15431 [00:00<?, ?it/s]Process ForkPoolWorker-2:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/local/lib/python3.10/multiprocessing/queues.py", line 367, in get
    return _ForkingPickler.loads(res)
MemoryError
Killed

I was wondering if you would have any suggestions? I have tried to also downsample to a certain extent without much luck. Is there any chance that the GPU support is something that has been considered? Thanks!