question about cactus-hal2maf

ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs

Other

481 stars 106 forks source link

question about cactus-hal2maf #1385

Open Marh32 opened 1 month ago

Marh32 commented 1 month ago

Hi,

I'm just curious about when I run cactus-hal2maf with follow parameters: --refGenome --refSequence --start --length --targetGenomes --noAncestors --chunkSize 500000 --dupeMode single. But I get a maf file which has four blocks like this: ref: contig1-contig1-contig1-contig1 target: contigX-contigY-contigX-contigZ Shouldn't the output result of maf be linearly aligned(align to same contig(such as contigX))？ Is there something wrong with my parameter setting? Or, I just want to get the real best comparison between the target genome and the ref genome in this region (similar to mafft), not the result of multiple contig spliced together, how should I set the parameters to achieve this？

Thanks in advance for your help.

glennhickey commented 1 month ago

Yeah, this really needs to be improved -- --dupeMode single does not do enough synteny checking, leading to cases where you can get a fragmented alignment where small regions of another contig have a higher identity. Hopefully we'll be able to get a better method implemented in the coming months.

In the meantime, if you're interested in a pairwise alignment, you might consider cactus-hal2chains --useHalSynteny which will in theory do a better job. Hopefully we'll have some changes to taffy

Marh32 commented 1 month ago

Thank you so much for your help. I'm looking forward to the new features of cactus.

Marh32 commented 1 month ago

I'm sorry to bother you again. I find cactus-hal2chains --useHalSynteny has the same problem. For example, as shown in the figure, contig JAKFHY010000570.1 and JAKFHY010002099.1 output as a nonlinear result. For this situation, do you have a recommended way to solve it？And I found that some areas are not in the output results of hal2chains, but hal2maf can output results, what might cause the difference in their output results? Thank you so much for your help.

glennhickey commented 1 month ago

I guess there are two possibilities why you are not seeing your expected alignment 1) Cactus is finding the alignment to the contig you want, plus one or more aignments to other contigs due to paralogy. It reports all the alignments then cacths-hal2maf and cactus-hal2chains both filter out the expected contig, and keep the other one. 2) Cactus is only ever aligning to the "wrong" contig -- so no amount of filtering or synteny checking will get you the result you expect.

You can distinguish between the two cases by making a MAF of your region without any duplicate filtering. That said, I'm not sure what the next step would be. If you're sure it's a bug at that point and can share some data for the region, I can try pinpointing it over here...