Open Marh32 opened 6 months ago
There's no easy way to check this from the cactus output. You could
--maximumGapLength
big enough to span your gap (note: this won't work for very big gaps). If you see a big insertion and deletion in the MAF, that'd be a sign of an under alignment. Ok.Thank you so much for your reply. Do I have any tools can extract the specific region of hal file to fasta file format(retain alignment information)?
In addition, why can a single line in a BED file correspond to multiple alignment results?Does this indicate that a single contig in the BED file aligns to multiple regions?
My understanding is that HAL files are indeed derived from constructing a homology map based on anchors produced by tools like LASTZ during whole-genome alignments, eventually leading to the formation of full-genome comparisons. If an element in a BED file does not reside within a block, it should return an empty result, whereas if it's within a block, it should return a unique mapping result. Why would there be a situation where multiple results are returned?
If there's one copy of gene A in species 1 and two copies in species 2, then then all three copies will (probably) be aligned together in Cactus. Due to such paralogous relationships, you can expect a given query region to map to multiple reference regions. There's a tool, halSynteny
that tries to filter this somewhat. You can run it yourself or within cactus-hal2chains
Thank you so much for your reply. I have try to use halSynteny
to filter it. And I get the results as follow:
In this situation, should the alignment result of the third line be considered error or attributed to such paralogous relationships? Consequently, when searching for orthologous genes or conserved elements, should I indeed filter out these alignment outcomes(like thrid line)? Also, I find that there is some missing alignment information between blocks in the returned result (such as from 30317824 (in the first row) to 30323901 (in the second row)), is there any way I can get this missing alignment information? Thank you for your help
Hi,
I'm sorry to bother you. I was a little confused when using halLiftover. When using halLiftover to locate the corresponding conserved element (e.g., Conserved Element 1) in the target genome based on annotations from the reference genome, I occasionally receive empty results. How can I ascertain whether this outcome is due to the genuine absence of this element in the corresponding region of the target genome (Case 1) or because of issues such as poor assembly quality of the target genome, leading to the entire region not aligning properly (Case 2)? Thank you so much for your help.
Best regards, Hao