bjmt / universalmotif

Motif manipulation functions for R.
GNU General Public License v3.0
25 stars 8 forks source link

How to merge motifs with variable length of gap #26

Closed xiao00su closed 8 months ago

xiao00su commented 8 months ago

my motifs have 2-5 basepair gap in the middle. Directly merge those motifs using merge_motifs, the result was not good. How can I merge those motifs properly?

Lhc2_long

Rplot01

snystrom commented 8 months ago

What is your desired output exactly? Or your intended use-case for the merged motif? A variable-length gap isn't well encoded by a single PWM, so you'll need to make some sort of tradeoff here.

bjmt commented 8 months ago

Thanks for checking out the package!

As @snystrom has mentioned, a PWM isn't the best format here. As far as the universalmotif package is concerned, motifs are assumed to be of fixed length. I do implement a certain kind of variable gapped motif in the universalmotif package (see add_gap()), but its use is currently limited only to scanning for occurrences of the motifs in sequences. Those gaps are totally ignored by compare_motifs(), view_motifs(), merge_motifs(), etc.

If you absolutely need to merge the two segments, you could always try doing it manually. For example, you could first identify which positions are of interest (i.e., high information content positions) with colSums(convert_type(my_motif, "ICM")), then create individual segments based on which positions you want using subset(my_motif, 3:8) before trying with merge_motifs() (for example).

Other than that I cannot think of a possible solution using available universalmotif functionality sadly, so I will close this issue. Feel free to reopen if you have additional questions.

xiao00su commented 8 months ago

Thank you very much for your quick replies. Recently I am working on snATAC data, I collected lost of motifs from different database. Some genes have hundreds of similar motifs. I think it would be useful to merge them into a single motif to do the motif scan. The motifs I listed in the picture were motifs of the same gene collected from different database.
Lhx2

bjmt commented 8 months ago

Interesting. I agree, merging them into a consensus motif before scanning is a good idea. However in my opinion you shouldn't try and merge the variable gap motifs with the rest, since they are too different from everything else.

xiao00su commented 8 months ago

I try two stragedy to do the merge. A: 1. caculate the similarity score of each motif and get the Topological overlap Matrix (TOM). (homer compareMotifs.pl)

  1. cluster the motifs based on TOM (seurat )
  2. merge the motifs of each gene by clusters (stackMotif/universalMotif, mergeMotifs)

B: merge the motifs of each gene by universalMotifs::merge_similar. (easy but may need to adjust the paremeter of each gene)

My concern is how likely the consensus motif is the right one? Both method show some degree of reasonable consensus motif. I collected ~6000 of motifs from ~700 genes. How can I estimate the consensus motifs batchly(not by eye)?

motif_cluster2

merge_similar Lhx2-merge

bjmt commented 7 months ago

Oh wow, neat to see the two approaches give such similar results.

Unfortunately I don't think there's an easy answer for your question. What I've done myself recently is to optimize for clustering which result in consensus motifs with the strongest enrichment in the target sequences versus the background. In other words, if the significance of enrichment of the merged motif is weaker than the original motifs then I would not use the merged motif and change the clustering parameters.

xiao00su commented 7 months ago

enrichment scoring is a good idea to test the consensus motif. I will try some of TFs. Wish the clustering optimiztion goes well and available to be used soon.