aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
414 stars 178 forks source link

pySCENIC grnboost/distributed issue when running on large matrix #73

Open sunnyzwu opened 5 years ago

sunnyzwu commented 5 years ago

Hi guys, really liking the pySCENIC adaptation (saving alot of time in our hands!) however, Im having the following issues with pySCENIC grnboost (through command line) when running on a large integrated matrix (132847 cells by 12194 genes). Error still occurs even after reducing the --num_workers to 1.

Ive tested that pySCENIC indeed works fine, without errors, when subsampling this integrated matrix (works fine on 5-10k cells by 12k genes) so I suspect an issue related to matrix size? Not familiar with the output errors so any help would be appreciated.

`pyscenic grnboost ${MATRIXNAME} ${TFFILEPATH} -t -o output/grnboost_output.tsv --num_workers 16 2019-04-29 16:54:08,590 - pyscenic.cli.pyscenic - INFO - Loading ex pression matrix.

2019-04-29 23:53:46,694 - pyscenic.cli.pyscenic - INFO - Inferring regulatory networks. Traceback (most recent call last): File "/share/ClusterShare/software/contrib/CTP_single_cell/tools/ anaconda2/envs/py36/bin/pyscenic", line 11, in sys.exit(main()) File "/share/ClusterShare/software/contrib/CTP_single_cell/tools/ anaconda2/envs/py36/lib/python3.6/site-packages/pyscenic/cli/pyscen ic.py", line 397, in main args.func(args) File "/share/ClusterShare/software/contrib/CTP_single_cell/tools/ anaconda2/envs/py36/lib/python3.6/site-packages/pyscenic/cli/pyscen ic.py", line 63, in find_adjacencies_command client, shutdown_callback = _prepare_client(args.client_or_addr ess, num_workers=args.num_workers) File "/share/ClusterShare/software/contrib/CTP_single_cell/tools/ anaconda2/envs/py36/lib/python3.6/site-packages/pyscenic/prune.py", line 63, in _prepare_client threads_per_worker=1) File "/share/ClusterShare/software/contrib/CTP_single_cell/tools/anaconda2/envs/py36/lib/python3.6/site-packages/distributed/deploy/local.py", line 169, in init blocked_handlers=blocked_handlers) File "/share/ClusterShare/software/contrib/CTP_single_cell/tools/anaconda2/envs/py36/lib/python3.6/site-packages/distributed/scheduler.py", line 993, in init **kwargs) File "/share/ClusterShare/software/contrib/CTP_single_cell/tools/anaconda2/envs/py36/lib/python3.6/site-packages/distributed/node.py", line 48, in init deserialize=deserialize, io_loop=self.io_loop) File "/share/ClusterShare/software/contrib/CTP_single_cell/tools/anaconda2/envs/py36/lib/python3.6/site-packages/distributed/core.py", line 144, in init stop=stop, File "/share/ClusterShare/software/contrib/CTP_single_cell/tools/anaconda2/envs/py36/lib/python3.6/site-packages/distributed/profile.py", line 270, in watch thread.start() File "/share/ClusterShare/software/contrib/CTP_single_cell/tools/anaconda2/envs/py36/lib/python3.6/threading.py", line 846, in start _start_new_thread(self._bootstrap, ()) RuntimeError: can't start new thread Exception ignored in: <object repr() failed> ` Cheers, Sunny

cflerin commented 5 years ago

Hi Sunny,

We've previously run pySCENIC here with some pretty large files (157k cells, 20k genes), similar to yours, and it has worked (although memory usage is high). This looks like a dask error, but my first thought is for you to check your ulimit (ulimit -u). Dask creates a ton of file connections, and I wonder if the larger matrix is creating more than your limit. Mine is set to 8192 but possibly 4096 would also be sufficient. Can you let me know your pySCENIC version as well (pyscenic -h)?

Chris

sunnyzwu commented 5 years ago

Hi Chris,

My pySCENIC version is 0.9.1;

pyscenic -h

usage: pyscenic [-h] {grnboost,ctx,aucell} ...'

Single-CEll regulatory Network Inference and Clustering (0.9.1)

I dont believe the file limits are an issue as we submit these jobs to our cluster requesting high memory resources. This is an example of the ulimit for one of these jobs we request.

ulimit -u

3096564

Sunny

ybaeus commented 1 year ago

Any updates on this ?