aertslab / create_cisTarget_databases

Create cisTarget databases
37 stars 8 forks source link

motif2TF #37

Open Lidw2020 opened 1 year ago

ghuls commented 1 year ago

Our SCENIC+ public motif collection is now available: https://resources.aertslab.org/cistarget/motif_collections/

moitf2TF snapshots can be found at: https://resources.aertslab.org/cistarget/motif_collections/v10nr_clust_public/snapshots/

To create annotation for other species try to find orthologs for your species for the gene name mentioned in the gene_name (column 6) column and replace it with the gene name of your species. If you don't have an ortholog, for that gene, delete the line.

KyleFerchen commented 1 year ago

Can you please explain how the motif PWM files can be linked to TFs? I had thought that the PWM files were named by the motif id field in the motif2tf database, but only about 40% of the PWM file names match an entry in the mgi database table. Can you please document how the PWM files should be annotated with TFs? Perhaps with an example on the repository pages for the motif2tf databases

SeppeDeWinter commented 1 year ago

Hi @KyleFerchen

You are in fact linking the motifs correctly to the TF names. It's correct that only 40% of the files match with the motif2tf file.

This number might seem low. However, you have to keep in mind that some of the files contain multiple motifs (the files named metacluster_*.cb).

Best,

Seppe

ghuls commented 1 year ago

@KyleFerchen Also not all PWMs have a known or specific TF attached to it. Associating a PWM with the correct TF is a big challenge.

KyleFerchen commented 1 year ago

Thank you for clarifying!

However, I'm wondering why I can see some examples like cisbp__M08984 in the motif2tf file, which doesn't seem to have a pwm file in the "singletons" directory. Yet, this motif is annotated in the cisbp2 database with a pwm matrix here.

Is this entry in the motif2tf file collapsed to another PWM in some way?

ghuls commented 1 year ago

Not all PWMs listed in the motif to TF file are in the singletons directory as some motifs from different collections are exactly the same:

swissregulon__mm__Gfi1 = swissregulon__rn__Gfi1  = cisbp__M08905 cisbp__M08984

And swissregulon__mm__Gfi1 itself is in metacluster_48.3.