pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
I am running pySCENIC using the singularity container with scipy (aertslab-pyscenic-scanpy-0.12.1-1.9.1.sif) on a decently large dataset on a HPC, with a 150G memory and 40 cores allocation (salloc -J interact -N 1-1 -n 40 --mem=150G --time=2:00:00 -p parallel srun --pty bash). I had been able to create meta cells and run the pipeline, however would still like to examine the results with the original sc data if possible. I ran into the following issue with the command shown:
arboreto_with_multiprocessing.py \
/home/xli324/data-kkinzle1/xli324/scRNAseq/Chetan/filtered.loom \
/home/xli324/data-kkinzle1/xli324/resources/allTFs_hg38.txt \
--method grnboost2 \
--output /home/xli324/data-kkinzle1/xli324/scRNAseq/Chetan/adj.tsv \
--num_workers 40 \
--seed 777
Loaded expression matrix of 230586 cells and 15431 genes in 117.41096949577332 seconds...
Loaded 1892 TFs...
starting grnboost2 using 40 processes...
0%| | 0/15431 [00:00<?, ?it/s]Process ForkPoolWorker-2:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/local/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/multiprocessing/pool.py", line 114, in worker
task = get()
File "/usr/local/lib/python3.10/multiprocessing/queues.py", line 367, in get
return _ForkingPickler.loads(res)
MemoryError
Killed
I was wondering if you would have any suggestions? I have tried to also downsample to a certain extent without much luck. Is there any chance that the GPU support is something that has been considered? Thanks!
I am running pySCENIC using the singularity container with scipy (aertslab-pyscenic-scanpy-0.12.1-1.9.1.sif) on a decently large dataset on a HPC, with a 150G memory and 40 cores allocation (salloc -J interact -N 1-1 -n 40 --mem=150G --time=2:00:00 -p parallel srun --pty bash). I had been able to create meta cells and run the pipeline, however would still like to examine the results with the original sc data if possible. I ran into the following issue with the command shown:
I was wondering if you would have any suggestions? I have tried to also downsample to a certain extent without much luck. Is there any chance that the GPU support is something that has been considered? Thanks!