Closed xiao00su closed 8 months ago
What is your desired output exactly? Or your intended use-case for the merged motif? A variable-length gap isn't well encoded by a single PWM, so you'll need to make some sort of tradeoff here.
Thanks for checking out the package!
As @snystrom has mentioned, a PWM isn't the best format here. As far as the universalmotif package is concerned, motifs are assumed to be of fixed length. I do implement a certain kind of variable gapped motif in the universalmotif package (see add_gap()
), but its use is currently limited only to scanning for occurrences of the motifs in sequences. Those gaps are totally ignored by compare_motifs()
, view_motifs()
, merge_motifs()
, etc.
If you absolutely need to merge the two segments, you could always try doing it manually. For example, you could first identify which positions are of interest (i.e., high information content positions) with colSums(convert_type(my_motif, "ICM"))
, then create individual segments based on which positions you want using subset(my_motif, 3:8)
before trying with merge_motifs()
(for example).
Other than that I cannot think of a possible solution using available universalmotif functionality sadly, so I will close this issue. Feel free to reopen if you have additional questions.
Thank you very much for your quick replies.
Recently I am working on snATAC data, I collected lost of motifs from different database. Some genes have hundreds of similar motifs. I think it would be useful to merge them into a single motif to do the motif scan.
The motifs I listed in the picture were motifs of the same gene collected from different database.
Interesting. I agree, merging them into a consensus motif before scanning is a good idea. However in my opinion you shouldn't try and merge the variable gap motifs with the rest, since they are too different from everything else.
I try two stragedy to do the merge. A: 1. caculate the similarity score of each motif and get the Topological overlap Matrix (TOM). (homer compareMotifs.pl)
B: merge the motifs of each gene by universalMotifs::merge_similar. (easy but may need to adjust the paremeter of each gene)
My concern is how likely the consensus motif is the right one? Both method show some degree of reasonable consensus motif. I collected ~6000 of motifs from ~700 genes. How can I estimate the consensus motifs batchly(not by eye)?
merge_similar
Oh wow, neat to see the two approaches give such similar results.
Unfortunately I don't think there's an easy answer for your question. What I've done myself recently is to optimize for clustering which result in consensus motifs with the strongest enrichment in the target sequences versus the background. In other words, if the significance of enrichment of the merged motif is weaker than the original motifs then I would not use the merged motif and change the clustering parameters.
enrichment scoring is a good idea to test the consensus motif. I will try some of TFs. Wish the clustering optimiztion goes well and available to be used soon.
my motifs have 2-5 basepair gap in the middle. Directly merge those motifs using merge_motifs, the result was not good. How can I merge those motifs properly?