dentearl / mafTools

Bioinformatics tools for dealing with Multiple Alignment Format (MAF) files.
Other
104 stars 32 forks source link

Question regarding potential bias in mafDuplicateFilter when cactus-generated ancestors are present #34

Open rsharris opened 1 year ago

rsharris commented 1 year ago

(I'm not expecting any response, as this package hasn't been modified in ten years. But I'm recording this issue in case anyone picks up this package in the future.)

I'm using mafDuplicateFilter to remove duplicates from a cactus-generated multiple alignment. Cactus has inferred ancestral sequences at the internal nodes of the tree. I wonder if the presence of these could bias the results of duplicate removal.

As I understand it, duplicate removal works by (a) computing the consensus sequence of all segments in an alignment block, then (b) chooses, for each species present, the segment that most closely matches the consensus. My concern is that the presence of ancestral segments could change the consensus, and thus change the segment picked for some of the species.