aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
439 stars 181 forks source link

Empty regulons after cisTarget pruning step[results] #177

Closed pikapika505 closed 4 years ago

pikapika505 commented 4 years ago

I am repeating this tutorial on my own human single-cell gene expression data. I am using 'hg19-500bp-upstream-10species.mc9nr.feather', 'hg19-tss-centered-5kb-10species.mc9nr.feather', and 'hg19-tss-centered-10kb-10species.mc9nr.feather' genome ranking databases fom here. My motif annotation file is 'motifs-v9-nr.hgnc-m0.001-o0.0.tbl' from here.

After execution this CL code: !pyscenic ctx {ADJACENCIES_FNAME} {DBS_PARAM}\ --annotations_fname {MOTIF_ANNOTATIONS_FNAME}\ --expression_mtx_fname {EXP_MTX_QC_FNAME} \ --output {MOTIFS_FNAME} \ --num_workers 26

I get bunch of warnings like this: 2020-05-29 13:39:20,623 - pyscenic.transform - WARNING - Less than 80% of the genes in ERG could be mapped to hg19-tss-centered-10kb-10species.mc9nr. Skipping this module. [########################################] | 100% Completed | 2min 11.1s

However, the output file is empty: ,,Enrichment,Enrichment,Enrichment,Enrichment,Enrichment,Enrichment,Enrichment,Enrichment ,,AUC,NES,MotifSimilarityQvalue,OrthologousIdentity,Annotation,Context,TargetGenes,RankAtMax TF,MotifID,,,,,,,,

I saw in issue #134 that it might be because of using mismatching data (human vs mouse). But I am using human data only. My data is from mouse xenograft model but I do not think it matters.

What else can it be?

cflerin commented 4 years ago

Hi @YuliaInn , it seems like you're running everything properly but it's a bit surprising that this completes in only two minutes. It's possible that all of your modules are getting pruned out (which triggers the warnings you saw), leaving nothing at the end of the analysis. Is your data on the small side maybe (number of genes)? You could try running with the --no_pruning flag to see if this produces any regulons. If so, you could then try changing some of the motif enrichment or module generation thresholds (see pyscenic ctx -h).

pikapika505 commented 4 years ago

I tried --no_pruning and got this: ....... 2020-06-01 20:59:13,218 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for ARX could be mapped to hg19-500bp-upstream-10species.mc9nr. Skipping this module. [#################### ] | 50% Completed | 1min 40.1s Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.7/bin/pyscenic", line 8, in sys.exit(main()) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 421, in main args.func(args) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 166, in prune_targets_command num_workers=args.num_workers) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyscenic/prune.py", line 380, in find_features filter_for_annotation=False, kwargs), base_url=motif_base_url) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyscenic/prune.py", line 351, in prune2df num_workers, module_chunksize) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyscenic/prune.py", line 300, in _distributed_calc return create_graph().compute(scheduler='processes', num_workers=num_workers if num_workers else cpu_count()) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dask/base.py", line 156, in compute (result,) = compute(self, traverse=False, kwargs) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dask/base.py", line 397, in compute results = schedule(dsk, keys, kwargs) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dask/multiprocessing.py", line 192, in get raise_exception=reraise, kwargs) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dask/local.py", line 501, in get_async raise_exception(exc, tb) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dask/compatibility.py", line 112, in reraise raise exc File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dask/local.py", line 272, in execute_task result = _execute_task(task, data) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dask/local.py", line 252, in _execute_task args2 = [_execute_task(a, cache) for a in args] File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dask/local.py", line 252, in args2 = [_execute_task(a, cache) for a in args] File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/dask/local.py", line 253, in _execute_task return func(*args2) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in modules2df for module in modules]) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in for module in modules]) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyscenic/transform.py", line 185, in module2df weighted_recovery=weighted_recovery) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyscenic/transform.py", line 129, in module2features_auc1st_impl aucs = calc_aucs(df, db.total_genes, weights, auc_threshold) File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyscenic/recovery.py", line 283, in aucs assert maxauc > 0 AssertionError

cflerin commented 4 years ago

It looks like none of your genes are mapping to the genome ranking databases. Can you take a look at comments in #158 and see if your genes are named correctly? In particular:

davisidarta commented 4 years ago

I'm having this exact same problem as of September 2020. Could we have the issue reopened?

Additional info: I'm running pySCENIC on my local machine, following the collab notebooks, and using the arboreto_with_multiprocessing.py script to generate the adjacencies file. I end up with an empty regulon matrix, or at best with a single TF (tried several several times).

cflerin commented 4 years ago

Hi @davisidarta , Are you using the latest pySCENIC version (0.10.3)? This has a fix that might solve this. Also check that you're using the same expression matrix for the GRN and cisTarget steps. If you're still getting the empty file, it could be another issue.