ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
523 stars 111 forks source link

--dupeMode parameter for Phast #1063

Closed rejo27 closed 1 year ago

rejo27 commented 1 year ago

Hello, I noticed the --dupeMode parameter. The single parameter only keeps one copy, ancestral parameter keeps all copies. Now, I will use the alignment results of maf files and the phastCons to identify the degree of conservation. The ancestral parameter seems to be similar to the --onlyOrthologs parameter in older versions. This paper uses the --onlyOrthologs parameter and maf_stream dup_merge consensus for processing maf files. Do I just use the single parameter and it's ok? Or, I use the ancestralparameter and maf_stream dup_merge consensus as abovementioned paper. What is the difference between the two? Which parameter do you think I use better (single, ancestral or all). Looking forward to your reply.

glennhickey commented 1 year ago

Hi. This is an excellent question. And, to tell you the truth, I wasn't really aware of maf_stream!

From what I can tell, it's dup_merge consensus is going to agressively squish together dupes, resulting in higher coverage than what would currently happen with --dupeMode single.

On the downside, the resulting MAF will be technically invalid. But on the plus side, the MAF columns will give, perhaps, a better reflection of total coverage for PhyloP.

For example, if I have a block like

ref:1-10      ACGTCATTT
alt:1-5       AC----TTT
alt:30-33     A-GC-----

--dupeMode single would simply choose the alt block with the best coverage:

ref:1-10      ACGTCATTT
alt:1-5       AC----TTT

but I think maf_stream would actually merge them together

ref:1-10      ACGTCATTT
alt:1-7       ACGC--TTT

even though the second row does not correspond to any actual sequence.

This seems pretty reasonable for any downstream applications that care only about columns (Genome Browser, PhlyloP).

Anyway, I will run some tests and see about incorporating maf_stream as an option within cactus-hal2maf in the next release. Thanks again for bringing this to my attention.

rejo27 commented 1 year ago

This is great and cleared up my confusion. Thank you so much.