ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
526 stars 111 forks source link

Identifying orthologous sites #1496

Open Marh32 opened 1 month ago

Marh32 commented 1 month ago

Hi,

I'm so sorry to bother you. Currently, I have successfully obtained hal file for 3 mammals using Progressive Cactus. However, I now want to identify orthologous sites among these species, such as (the 5th C site on contigA of species1 and the 10th C site on contigB of species2 being orthologous, similar to SNPs). Can any tools achieve this automatically? Or do you have any suggestions for methods that could provide me with an initial file so that I can complete similar analyses myself afterward?

Thanks a lot !!!

glennhickey commented 1 month ago

You can try halLiftover or convert to MAF. For indexing and querying MAF, you can use taffy (which is included in cactus)

Marh32 commented 1 month ago

Thank you very much for your reply. Do I need to consider paralogous sites, or will halliftover filter this automatically? Thank you again.

glennhickey commented 1 month ago

If the paralogs were aligned together by cactus (which they often are) then liftover will lift to all copies. You can try halSynteny instead which does some filtering, or explore the different MAF options.

Marh32 commented 1 month ago

Ok, I see. Thank you for your help