aertslab / SCENIC

SCENIC is an R package to infer Gene Regulatory Networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
413 stars 93 forks source link

Can I skip the step of filtering by TFinDB #14

Closed liuyifang closed 5 years ago

liuyifang commented 6 years ago

Hi,

Thank you for the useful package. In Step 2:

1.3 Keep only the motifs annotated to the initial TF

motifEnrichment_selfMotifs <- motifEnrichment[which(motifEnrichment$TFinDB != ""),, drop=FALSE]

Several important TFs for cell development (already verified by my experiment) are filtered out by this step which are empty in TFinDB. My research object is a pretty novel cell system and maybe lack of relevant info in database. My question is can I skip this step? Will it affect the downstream analysis? Or do you have any other suggestions to modify parameters or databases?

s-aibar commented 6 years ago

Hello,

SCENIC reconstructs the gene-regulatory network by finding sets of co-expressed genes to each of the given TFs (GENIE3), and selecting within these potential targets, those with enrichment of the TF motif (RcisTarget). This selection step is based on the "motif2tf" database we built for a previous tool (iRegulon). This database –linking motifs to transcription factors– was built based on the annotations provided by the original motif database, and extended using gene homology (e.g. family members, across species, ...) and motif similarity (e.g. similar motifs with different annotations across databases). Of course, it might still be missing either motifs or annotations to less characterised TFs.

In SCENIC pipeline, the filtering of the targets is performed on the second step, by comparing the "motif2tf" information (stored in the columns motifEnrichment$TF_direct and motifEnrichment$TF_inferred) with the TF from the co-expression module ("input TF" in GENIE3, stored in the column motifEnrichment$highlightedTFs). Only those "rows" that match are kept (setting motifEnrichment$TFinDB != ""), and will be used to build the "regulons".

I would discourage removing that filter, since it would lead to many false positives (keeping the co-expression modules that present motif enrichment to any TF). However, you can certainly explore the results of the co-expression modules of the TFs that you are interested on. If you know certain motifs for those TFs, you can manually annotate them by adding some content to the motifEnrichment$TFinDB column, so they are not filtered out. Otherwise, I would recommend to follow the standard SCENIC pipeline, but exploring those co-expression modules further. For example: 1) explore their motif enrichment –as you are already doing–, to see which motifs are enriched on them. This might point to a binding motif of your TF (e.g. a motif with very hight NES score), or to identify co-factors... 2) evaluate the co-expression modules with AUCell (as gene-set, without including them in the AUC matrix as "regulon") to see whether they are specially active in a given cluster of cells, or correlated with any of the regulons...

Of course, you should also keep in mind the possibility that although the TF is important for your system (and therefore it might have co-expressed genes), that it might be a co-factor to other TFs, or that it might not directly bind its targets, etc...

I hope this helps, Sara

liuyifang commented 6 years ago

Hi Sara,

I evaluate the important TF co-expression modules with AUCell and find out they are specially active in an important cluster of cells. I also discover some "regulons" with similar pattern. I believe the important TF has the potential to be a key "regulon". I am asking other lab members if they have done relevant experiment. Also, can you tell me where I can find the binding motif data from the public database?

Thanks, Yifang