annotation databases used for the run[results]

aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.

http://scenic.aertslab.org

GNU General Public License v3.0

400 stars 176 forks source link

annotation databases used for the run[results] #266

Closed deevdevil88 closed 3 years ago

deevdevil88 commented 3 years ago

Hi! i was wondering, if i include two different ranking databases during the pySCENIC ctx step, which have either 10kb UP and down of TSS or 500bp up and 100 down annotations, then does pySCENIC incorporate both sets of annotations when it does the motif enrichment and target pruning, or choses the best option?

if it chooses the best option, how exactly is that done or if both sets of annotations are used , how exactly is the decision made if the target gene is enriched for a particular TF motif or not and is this information written out as part of the analysis?

Thanks Devika

rojinsafavi commented 3 years ago

@cflerin I was wondering if you could elaborate in the question above?

cflerin commented 3 years ago

The regulons are created for each enriched and annotated feature for each database, and then subsequently aggregated. The most enriched annotated motif (by NES) is taken for the overall score. The target genes are merged if there are multiple modules with the same TF. You can find this information in the regulons file (if written to disk) or table, in the 'Context' column. For each TF, it will list the database used, how this module was generated from the grn (top50, etc.) and whether this is activating or repressing (only activating kept by default). Hope that helps...