Closed shangguandong1996 closed 1 year ago
If you have motifs directly annotated to a TF, you can use lines like this:
#motif_id motif_name motif_description source_name source_version gene_name motif_similarity_qvalue similar_motif_id similar_motif_description orthologous_identity orthologous_gene_name orthologous_species description
jaspar__MA0002.2 MA0002.2 RUNX1 jaspar 2016 RUNX1 0.000000 None None 1.000000 None None gene is directly annotated
If you have motifs that are annotated to a TF in another species, they can be annotated via orthology.
#motif_id motif_name motif_description source_name source_version gene_name motif_similarity_qvalue similar_motif_id similar_motif_description orthologous_identity orthologous_gene_name orthologous_species description
bergman__Su_H_ Su_H_ Su(H) bergman 1.1 RBPJ 0.000000 None None 0.722000 FBgn0004837 D. melanogaster gene is orthologous to FBgn0004837 in D. melanogaster (identity = 72%) which is directly annotated for motif
If you have unannotated motifs, you could run TomTom to see how similar they are to know motifs, to annotate them in that way:
#motif_id motif_name motif_description source_name source_version gene_name motif_similarity_qvalue similar_motif_id similar_motif_description orthologous_identity orthologous_gene_name orthologous_species description
jaspar__MA0001.2 MA0001.2 AGL3 jaspar 2016 MEF2D 0.000211 taipale_cyt_meth__MEF2D_CCWWATWWRG_eDBD_meth MEF2D [MADS, CpG-meth] 1.000000 None None motif similar to taipale_cyt_meth__MEF2D_CCWWATWWRG_eDBD_meth ('MEF2D [MADS, CpG-meth]'; q-value = 0.000211) which is directly annotated
Thanks for your reply. I also have some questions:
motif_name
or gene_name
. MA0002.2.cb or RUNX1.cb?motif_id
= motif_source__motif_name
. So our Cluster-Buster motif files are called motif_source__motif_name.cb
(jaspar__MA0002.2.cb
with a header >jaspar__MA0002.2
).it is better to have more motifs per TF
. So I can make a tbl like below?
#motif_id motif_name motif_description source_name source_version gene_name motif_similarity_qvalue similar_motif_id similar_motif_description orthologous_identity orthologous_gene_name orthologous_species description
jaspar__MA0002.2 MA0002.2 RUNX1 jaspar 2016 RUNX1 0.000000 None None 1.000000 None None gene is directly annotated
cisbp__MA0002.2 MA0002.2 RUNX1 jaspar 2016 RUNX1 0.000000 None None 1.000000 None None gene is directly annotated
If these two cb file(jaspar__MA0002.2.cb and cisbp__MA0002.2.cb) have same **gene_name**, these two motif rank value will link to same TF during pySCENIC working?
3. I just curious what's the advantage or disadvantage for just using 500 TF list. After all, it seems that using 500 TF will save computer resource duing co-expression.
Thanks again for your detailed reply :)
Feature_rank motif name is the motif_id (Cluster-Buster filename without .cb
).
better to have more motifs per TF
only makes sense when the motifs available for a certain TF are different (slightly different PWMS (e.g. different binding specificity in different cell types, or monomer PWM and dimer PWM). If you don't have different motifs for the same TF, just adding exactly the same motif, does not make sense.
P.S.: JASPAR is a well curated collection and is in general included in a lot of other collections (CIS-BP, TRANSFAC, ...) so in that case those PWMs will be the same.
And advantage of using 500TFs is that gene inference will likely be faster. A disadvantage might be that it will tell that a certain TF will regulate e.g 200 genes while when using 1500 TFs will reduce that list for that TF maybe to 150 genes as it assigned some of those genes to be regulated to different TFs for which you don't have motifs. So it might give you cleaner results. It also depends on the importance of your missing TFs and your dataset. Try both and see what works best.
Thanks, I get it :)
Hi, Dear developer.
Because my species is plant, so it make no sense to use ortho gene to replace gene in human or mouse tbl file. And I also have different source motif file and want to merge these motif files. So I want to create a motif tbl file like https://resources.aertslab.org/cistarget/motif2tf/ using motif2TF procedure. But I do not find a tool or script in the origin motif2TF paper "iRegulon: From a Gene List to a Gene Regulatory Network Using Large Motif and Track Collections".
I am wondering whether you can give me some advice about this.
Thanks for your reply
Guandong Shang