FePhyFoFum / phyx

phylogenetics tools for linux (and other mostly posix compliant) computers
blackrim.org
GNU General Public License v3.0
111 stars 17 forks source link

pxlssq gets Nexus labels wrong if contiain spaces #154

Closed josephwb closed 3 years ago

josephwb commented 3 years ago

Related to fbfef0d. When read in by pxlssq, the label is terminated at the first space (token). Can be seen with:

pxlssq -s seq_file -l

Example:

'GU931859.1 Dendroica adelaidae voucher STRI-SLDAD10 aconitase 1 (ACO1) gene, exon 8 and partial cds'

is reported as:

GU931859.1

It also obviously messes up character state counts, number of characters, whether it is aligned, etc. Oof.

TODO: if label begins with a quote, read label until closing quote. This can get more complicated when the label itself contains quotes (e.g., "Bewick's Wren"). Don't worry about that until it surfaces.

josephwb commented 3 years ago

Involves both read_next_seq_from_stream() and read_interleaved_nexus().

josephwb commented 3 years ago

First one fixed with 496bea7

josephwb commented 3 years ago

And the second bf78f55