Open Marh32 opened 1 month ago
Yeah, this really needs to be improved -- --dupeMode single
does not do enough synteny checking, leading to cases where you can get a fragmented alignment where small regions of another contig have a higher identity. Hopefully we'll be able to get a better method implemented in the coming months.
In the meantime, if you're interested in a pairwise alignment, you might consider cactus-hal2chains --useHalSynteny
which will in theory do a better job. Hopefully we'll have some changes to taffy
Thank you so much for your help. I'm looking forward to the new features of cactus.
I'm sorry to bother you again. I find cactus-hal2chains --useHalSynteny
has the same problem. For example, as shown in the figure, contig JAKFHY010000570.1 and JAKFHY010002099.1 output as a nonlinear result. For this situation, do you have a recommended way to solve it?And I found that some areas are not in the output results of hal2chains, but hal2maf can output results, what might cause the difference in their output results? Thank you so much for your help.
I guess there are two possibilities why you are not seeing your expected alignment
1) Cactus is finding the alignment to the contig you want, plus one or more aignments to other contigs due to paralogy. It reports all the alignments then cacths-hal2maf
and cactus-hal2chains
both filter out the expected contig, and keep the other one.
2) Cactus is only ever aligning to the "wrong" contig -- so no amount of filtering or synteny checking will get you the result you expect.
You can distinguish between the two cases by making a MAF of your region without any duplicate filtering. That said, I'm not sure what the next step would be. If you're sure it's a bug at that point and can share some data for the region, I can try pinpointing it over here...
Hi,
I'm just curious about when I run cactus-hal2maf with follow parameters:
--refGenome --refSequence --start --length --targetGenomes --noAncestors --chunkSize 500000 --dupeMode single
. But I get a maf file which has four blocks like this: ref: contig1-contig1-contig1-contig1 target: contigX-contigY-contigX-contigZ Shouldn't the output result of maf be linearly aligned(align to same contig(such as contigX))? Is there something wrong with my parameter setting? Or, I just want to get the real best comparison between the target genome and the ref genome in this region (similar to mafft), not the result of multiple contig spliced together, how should I set the parameters to achieve this?Thanks in advance for your help.