Closed rejo27 closed 1 year ago
Hi. This is an excellent question. And, to tell you the truth, I wasn't really aware of maf_stream!
From what I can tell, it's dup_merge consensus is going to agressively squish together dupes, resulting in higher coverage than what would currently happen with --dupeMode single
.
On the downside, the resulting MAF will be technically invalid. But on the plus side, the MAF columns will give, perhaps, a better reflection of total coverage for PhyloP.
For example, if I have a block like
ref:1-10 ACGTCATTT
alt:1-5 AC----TTT
alt:30-33 A-GC-----
--dupeMode single
would simply choose the alt block with the best coverage:
ref:1-10 ACGTCATTT
alt:1-5 AC----TTT
but I think maf_stream
would actually merge them together
ref:1-10 ACGTCATTT
alt:1-7 ACGC--TTT
even though the second row does not correspond to any actual sequence.
This seems pretty reasonable for any downstream applications that care only about columns (Genome Browser, PhlyloP).
Anyway, I will run some tests and see about incorporating maf_stream as an option within cactus-hal2maf
in the next release. Thanks again for bringing this to my attention.
This is great and cleared up my confusion. Thank you so much.
Hello, I noticed the
--dupeMode
parameter. Thesingle
parameter only keeps one copy,ancestral
parameter keeps all copies. Now, I will use the alignment results of maf files and thephastCons
to identify the degree of conservation. Theancestral
parameter seems to be similar to the--onlyOrthologs
parameter in older versions. This paper uses the--onlyOrthologs
parameter andmaf_stream dup_merge consensus
for processing maf files. Do I just use thesingle
parameter and it's ok? Or, I use theancestral
parameter andmaf_stream dup_merge consensus
as abovementioned paper. What is the difference between the two? Which parameter do you think I use better (single, ancestral or all). Looking forward to your reply.