ComparativeGenomicsToolkit / taffy

This is a library C/Python/CLI for working with TAF (.taf,.taf.gz) and MAF (.maf) alignment files
MIT License
23 stars 3 forks source link

Add support for on-the-fly genome renaming via input tsv to taffy view #27

Closed glennhickey closed 1 year ago

glennhickey commented 1 year ago

This is the equivalent of halRenameGenomes but for MAF/TAF.

You pass the 2-column TSV to taffy view via the new -n option, and it will rename the contig names in the alingment block accordingly.

. characters get special treatment consistent with an <assembly>.<contig> naming scheme.

So if the mapping has hg38 -> Homo_sapiens then hg38.chr1 would get renamed to Homo_sapiens.chr1 etc. The naming also applies to input regions via the taffy view -r option. So if the MAF/TAF is indexed along hg38, then you can search for Homo_sapiens.chr1:10-100 via the name map.

The goal for this is to make working with TAF/MAF files that use accessions for assembly names .

Since both the assembly and contig names can have "."s in them, it uses a a greedy resolver, scanning dots from left to right to find one that makes a prefix that's in the input map.