aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
431 stars 179 forks source link

How to display the specific motifs we want? #360

Closed hyjforesight closed 2 years ago

hyjforesight commented 2 years ago

Hello SCENIC, The tutorial shows how to display the motifs of df_regulons.head() , we're wondering whether it is possible to display a specific motif, for example, Atoh1? Thanks! Best, YJ

df_regulons = pd.DataFrame(data=[list(map(op.attrgetter('name'), regulons)),
                                 list(map(len, regulons)),
                                 list(map(fetch_logo, regulons))], index=['name', 'count', 'logo']).T

MAX_COL_WIDTH = pd.get_option('display.max_colwidth')
pd.set_option('display.max_colwidth', -1)
display(HTML(df_regulons.head().to_html(escape=False)))
pd.set_option('display.max_colwidth', MAX_COL_WIDTH)

We try this, but it generates errors

MAX_COL_WIDTH = pd.get_option('display.max_colwidth')
pd.set_option('display.max_colwidth', -1)
display(HTML(df_regulons['name'].isin(['Atoh1']).to_html(escape=False)))
pd.set_option('display.max_colwidth', MAX_COL_WIDTH)

C:\Users\Park_Lab\AppData\Local\Temp/ipykernel_23484/2501430891.py:2: FutureWarning: Passing a negative integer is deprecated in version 1.0 and will not be supported in future version. Instead, use None to not limit the column width.
  pd.set_option('display.max_colwidth', -1)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_23484/2501430891.py in <module>
      1 MAX_COL_WIDTH = pd.get_option('display.max_colwidth')
      2 pd.set_option('display.max_colwidth', -1)
----> 3 display(HTML(df_regulons['name'].insi(['Atoh1']).to_html(escape=False)))
      4 pd.set_option('display.max_colwidth', MAX_COL_WIDTH)

~\anaconda3\envs\HYJ_py38\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5485         ):
   5486             return self[name]
-> 5487         return object.__getattribute__(self, name)
   5488 
   5489     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'insi'
SeppeDeWinter commented 2 years ago

Hi @hyjforesight

This is certainly possible.

The code you typed and the error that's thrown do not correspond.

By accident you type insi instead of isin, see:

----> 3 display(HTML(df_regulons['name'].insi(['Atoh1']).to_html(escape=False)))
hyjforesight commented 2 years ago

Hello @SeppeDeWinter, Thanks for the response. Sorry, my fault. I pasted the wrong error outputs. Here is the right output of errors: AttributeError: 'Series' object has no attribute 'to_html'.

MAX_COL_WIDTH = pd.get_option('display.max_colwidth')
pd.set_option('display.max_colwidth', -1)
display(HTML(df_regulons['name'].isin(['Atoh1']).to_html(escape=False)))
pd.set_option('display.max_colwidth', MAX_COL_WIDTH)

AttributeError                            Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_6276/4195165455.py in <module>
      1 MAX_COL_WIDTH = pd.get_option('display.max_colwidth')
      2 pd.set_option('display.max_colwidth', -1)
----> 3 display(HTML(df_regulons['name'].isin(['Atoh1']).to_html(escape=False)))
      4 pd.set_option('display.max_colwidth', MAX_COL_WIDTH)

~\anaconda3\envs\HYJ_py38\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5485         ):
   5486             return self[name]
-> 5487         return object.__getattribute__(self, name)
   5488 
   5489     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'to_html'

Sorry for troubling you a lot. I exported the whole list of url of the motif logos yesterday night (df_regulons.to_csv("C:/Users/Park_Lab/Documents/WI_sub_motifs.csv") and downloaded all of them manually. I found some confusing things.

  1. Motif is not consistent with the species of my data. I use mouse data, but the motif shows fly's. Rorc
  2. Motif logo name is not consistent with the regulon name. The regulon is E2F6, but it uses E2F4 logo. image
  3. Same regulon E2F6 in different samples uses different motifs. In another sample, it uses cisbp__M6195 instead of E2F4. image
  4. Different regulons in the same sample use the same motif. image
  5. For some certain regulons, one regulon may have several motif logos. How did SCENIC determine which one to display? image
  6. what does count mean? Does it represent the number of cells that is regulated by this regulon? image

Thanks! Best, YJ

SeppeDeWinter commented 2 years ago

For the error

AttributeError: 'Series' object has no attribute 'to_html'

to_html only works on pandas dataframes. So you should do something like this:

df_regulons.loc[df_regulons['name'].isin(['Atoh1']]

For the other questions:

  1. Motif is not consistent with the species of my data. I use mouse data, but the motif shows fly's. That's correct. The motif database contains motifs from a variety of species. Although this motif is derived from fly, because of conservation of transcription factors and sequence conservationDNA binding domains, this same motif can also be enriched in mouse. So nothing to worry about here.

  2. Motif logo name is not consistent with the regulon name. The regulon is E2F6, but it uses E2F4 logo. One motif can be linked to multiple transcription factors. The reason for this is that these transcription factors have the same (or very similar) DNA binding domains. i.e. the DNA binding domain of E2F6 and E2F4 is probably the same.

  3. Same regulon E2F6 in different samples uses different motifs. In another sample, it uses cisbp__M6195 instead of. E2F4. There is redundancy in the motif database. This means that for one TF there are multiple motifs (all with slight variation). The cisbp__M6195 motifs probably will look very similar to the E2F4 one. Of note, this redundancy in the database is really necessary. This is because there are also slight variations in DNA binding domains in the genome, to model this you need multiple motifs per factor.

  4. Different regulons in the same sample use the same motif. The reason for this is because one motif can be linked to multiple factors (see 2).

  5. For some certain regulons, one regulon may have several motif logos. How did SCENIC determine which one to display? scenic takes the motif with the highest enrichment score (NES).

  6. what does count mean? Does it represent the number of cells that is regulated by this regulon? Count is the number of target genes of the regulon.

hyjforesight commented 2 years ago

Hello @SeppeDeWinter Thanks for the detailed answers. You're so patient and kind! I appreciate it!