aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
163 stars 27 forks source link

Mis-labelled motifs ? #295

Open JaanaB1 opened 4 months ago

JaanaB1 commented 4 months ago

Hi Team,

Thank you for generating this package! I have a slight issue in regards to the motif nomenclature. I understand that the Motif library has been generated from multiple databases (and redundancies have been removed to make this more concise). However, I have an issue in that the motif which one of my regulons has been based on does not seem to belong to the regulon protein.

Here I have a regulon named as the Irf8 regulon which was based from the following motif; metacluster_167.7 image

The software calls this an Irf8 motif however looking at the Jaspar database the Irf8 motif is as follows; image MA0652.1 - Irf8 motif (Jaspar)

The above motif is not the same as the metacluster 167.7 motif, in fact the motif called is the motif for the Spi1 protein: image

I understand that as these two proteins interact there might be merged motifs however how does the software assign the motif (and regulon) itself to one protein or the other? Is this a case of the Irf8 motif being mis-labelled in the curated dataset?

Additionally, instead of using this curated database, is there an option by which we could feed SCENIC+ with only one database (for example JASPAR).

Many thanks in advance!

SeppeDeWinter commented 4 months ago

Hi @JaanaB1

The metacluster you are refering to consists out of these two motifs:

elemento__CACTTCC (indeed looks like an SPI1 motif)

image

and

homer__GGAAGTGAAAST_PU.1_IRF8

image

This one looks more similar to the IRF8 motif, although it's not completely the same. The annotation of the metacluster is based only on this second motif (the first one does not have any annotation). The annotation is based on this study https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE66899, where they performed ChIP-seq for IRF8 in dendritic cells. For this reason it's not labeled as SPI1.

It's possible to one use a single database for SCENIC+. In that case you have to generate a cistarget database with only the motif database you are interested in using and should also restrict the motif-to-tf annotation to that database.

I hope this helps?

All the best,

Seppe