FePhyFoFum / phyx

phylogenetics tools for linux (and other mostly posix compliant) computers
blackrim.org
GNU General Public License v3.0
111 stars 17 forks source link

phylip gets corrupted when piped #165

Open josephwb opened 3 years ago

josephwb commented 3 years ago

Other formats work, but not phylip for some raisin:

josephwb@Potiphar-Breen:baits$ cat foo.fa | pxlssq
File type: fasta
Number of sequences: 4568
Is aligned: true
Sequence length: 2571
--------- Nucl TABLE ----------
Nucl        Total   Proportion
   A       919913    0.0783283
   C      1124941    0.0957859
   G       588819    0.0501365
   T       896441    0.0763297
   -      8208557     0.698938
   ?         5657  0.000481679
 G+C      1713760     0.145922
--------- Nucl TABLE ----------
josephwb@Potiphar-Breen:baits$ cat foo.nex | pxlssq
File type: nexus
Number of sequences: 4568
Is aligned: true
Sequence length: 2571
--------- Nucl TABLE ----------
Nucl        Total   Proportion
   A       919913    0.0783283
   C      1124941    0.0957859
   G       588819    0.0501365
   T       896441    0.0763297
   -      8208557     0.698938
   ?         5657  0.000481679
 G+C      1713760     0.145922
--------- Nucl TABLE ----------
josephwb@Potiphar-Breen:baits$ cat foo.phy | pxlssq
Error: number of taxa declared in the file () does not match the number read (1). Exiting.

And yet is works on a phylip file:

josephwb@Potiphar-Breen:baits$ pxlssq -s foo.phy
File type: phylip
Number of sequences: 4568
Is aligned: true
Sequence length: 2571
--------- Nucl TABLE ----------
Nucl        Total   Proportion
   A       919913    0.0783283
   C      1124941    0.0957859
   G       588819    0.0501365
   T       896441    0.0763297
   -      8208557     0.698938
   ?         5657  0.000481679
 G+C      1713760     0.145922
--------- Nucl TABLE ----------
josephwb commented 3 years ago

This is more general: phylip fails when piped to one of the alignments converts (pxs2*) as well. In this latter case, only 1 (the first) sequence is recognized and written to file.

josephwb commented 3 years ago

Gah I am stumped on this one... For the moment try to avoid piping phylip format (the only one that seems to behave different in a file vs. a stream). I suspect it has something to do with checking which of the Baskin-Robbins number of phylip sub-formats a file conforms to (i.e., there is some peeking ahead in the stream that may screw things up).

josephwb commented 3 years ago

Ok. The culprit indeed appears to be is_complicated_phylip (the function which tries to determine which (of a plethora of) format flavours is involved) i.e., when I comment this out (on a 'vanilla' phylip file, so no fancy-pants reading is necessary) things are processed properly. Geez, I was slightly proud of how it could handle everything thrown at it... But this seems to be a stream-specific issue. Anyway, baby steps...