glennhickey / progressiveCactus

Distribution package for the Prgressive Cactus multiple genome aligner. Dependencies are linked as submodules
Other
80 stars 26 forks source link

Sequence ids are incorrectly truncated #15

Open mikolmogorov opened 10 years ago

mikolmogorov commented 10 years ago

I use custom xml config which disables preprocessing of sequence names (to UCSC-like) Then, from sequence header like "gi|480995221|gb|CM001518.2| Drosophila miranda" I expect in maf file: "gi|480995221|gb|CM001518.2|" but I get: "gi|480995221|gb|CM001518.2" -- last symbol of sequence id is truncated.

As it is written here - http://en.wikipedia.org/wiki/FASTA_format - in fasta header first space character separates sequence id from sequence description, so the last "|" is a part of sequence id. Not a very big problem, but it complicates automatic output processing..