ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
520 stars 111 forks source link

Retrieving Regions from Alignment #1510

Closed Tinydxy closed 1 week ago

Tinydxy commented 1 week ago

Hello,

I am currently working with whole-genome alignments generated using Cactus. I used phastCons to predict conserved elements and then employed phyloP to detect lineage-specific accelerated elements based on these conserved elements. However, when trying to retrieve the original sequence information from the whole-genome alignments (in MAF format) using these detected accelerated elements' coordinates (with the human reference), I couldn't find all corresponding FASTA sequences.

These accelerated elements were identified based on MAF files generated through the cactus-hal2maf process. Upon manually checking the missing regions, I discovered that some elements, which were marked as accelerated by phyloP, do not have corresponding sequence information in the MAF files. Despite this absence, phyloP identified these regions as accelerated.

I intend to use these original sequences for downstream analyses, such as the phyloacc workflow, but I am facing difficulties in finding the sequences for some detected accelerated elements. I would like to understand at which step I might be going wrong or missing crucial settings.

Thank you very much for your help! 屏幕截图 2024-10-28 222817

Tinydxy commented 6 days ago

This problem can be solved using taffy