Running example notebook is taking a long time

liboxun commented 4 years ago

I'm trying to run the PBMC tutorial Jupyter notebook (PBMC10k_SCENIC-protocol-CLI.ipynb).

It's taking some time to run pyscenic ctx. Right now it's been two days and it's still running. I'm running it with an on-campus HPC service. I'm starting to think maybe there's something that I overlooked.

How long should it typically take to run pyscenic ctx for the PBMC example?

Thanks in advance!

Boxun

liboxun commented 4 years ago

Here's the output I got so far:

2020-05-06 10:42:51,713 - pyscenic.cli.pyscenic - INFO - Creating modules.

2020-05-06 10:42:54,498 - pyscenic.cli.pyscenic - INFO - Loading expression matrix.

2020-05-06 10:43:00,178 - pyscenic.utils - INFO - Calculating Pearson correlations.

2020-05-06 10:43:00,178 - pyscenic.utils - WARNING - Note on correlation calculation: the default behaviour for calculating the correlations has changed after pySCENIC verion 0.9.16. Previously, the default was to calculate the correlation between a TF and target gene using only cells with non-zero expression values (mask_dropouts=True). The current default is now to use all cells to match the behavior of the R verision of SCENIC. The original settings can be retained by setting 'rho_mask_dropouts=True' in the modules_from_adjacencies function, or '--mask_dropouts' from the CLI. Dropout masking is currently set to [True]. /home2/s418610/.conda/envs/py37_res_GRN/lib/python3.7/site-packages/pyscenic/utils.py:138: RuntimeWarning: invalid value encountered in greater regulations = (rhos > rho_threshold).astype(int) - (rhos < -rho_threshold).astype(int) /home2/s418610/.conda/envs/py37_res_GRN/lib/python3.7/site-packages/pyscenic/utils.py:138: RuntimeWarning: invalid value encountered in less regulations = (rhos > rho_threshold).astype(int) - (rhos < -rho_threshold).astype(int)

2020-05-06 10:43:29,853 - pyscenic.utils - INFO - Creating modules.

2020-05-06 10:45:26,430 - pyscenic.cli.pyscenic - INFO - Loading databases.

2020-05-06 10:45:26,434 - pyscenic.cli.pyscenic - INFO - Calculating regulons.

And it's been running 'Calculating regulons' since then.

cflerin commented 4 years ago

Hi @liboxun ,

It should not take 2+ days to run this step. Depending on the number of processes used, I'd expect it to complete in under an hour at worst. I would suggest maybe stopping the process, and re-starting it. Also, are you using the same database files as in the tutorial?

liboxun commented 4 years ago

Hi @cflerin ,

Thanks for the quick reply! Good to know.

I've submitted multiple jobs (with the same script), and it never ended within a day. I use 32 processes, as it's the number of cores of the HPC computer I use. Therefore re-starting seems not to solve the problem.

I believe I'm using the same databases as in the tutorial. Quoting the PBMC10k_SCENIC-protocol-CLI.ipynb:

ranking databases

f_db_glob = "/ddn1/vol1/staging/leuven/res_00001/databases/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/*feather" f_db_names = ' '.join( glob.glob(f_db_glob) )

motif databases

f_motif_path = "/ddn1/vol1/staging/leuven/res_00001/databases/cistarget/motif2tf/motifs-v9-nr.hgnc-m0.001-o0.0.tbl"

In comparison, the databases I'm using are downloaded from:

https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__10kb_up_and_down_tss.mc9nr.feather

https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based/hg38__refseq-r80__500bp_up_and_100bp_down_tss.mc9nr.feather

https://resources.aertslab.org/cistarget/motif2tf/motifs-v9-nr.hgnc-m0.001-o0.0.tbl

To me it seems they match up.

cflerin commented 4 years ago

Hi @liboxun , I think we solved your issue in the pySCENIC issue tracker, but for anyone else having the same issue, I'll leave this link to a list of recommendations that could potentially solve this: https://github.com/aertslab/pySCENIC/issues/142#issuecomment-625982886

liboxun commented 4 years ago

Hi @cflerin ,

Yes, and thank you!

As a reference for anybody that might be having the same issue: for me personally running a Singularity image of pySCENIC instead of the CLI solved the problem.

aertslab / SCENICprotocol

Running example notebook is taking a long time #11

ranking databases

motif databases