dnbaker / dashing

Fast and accurate genomic distances using HyperLogLog
GNU General Public License v3.0
160 stars 11 forks source link

Typo in output with -U? #34

Closed mihkelvaher closed 4 years ago

mihkelvaher commented 4 years ago

dashing dist -k16 -p7 *.gz-U -O myout produces

3 genome1.fna.gz t0 t0.01022848 genome2.fna.gz t0.0408798 genome3.fna.gz

notice the "t"-s. A missing backslash for tab perhaps?

dnbaker commented 4 years ago

Hi Mihkel,

Thanks for the report! You're right, this is missing a backslash. It looks like I introduced this when i switched from %f to %g formatting. I've patched this up in master and I'll update binaries soon.

If you want a square matrix, does -T/--full-tsv work for your purposes?

mihkelvaher commented 4 years ago

Hi, The square matrix produced with -T seems to work with quicktree after the line "##Names..." is replaced with the number of sequences, giving it more a phylip look (https://www.mothur.org/wiki/Phylip-formatted_distance_matrix). The initial quicktree parsing issues were probably due to some copy-paste errors.

At first I dismissed -T as an option because I thought it was something like g1 g2 0.245 g2 g4 1.000 ... It might be considered that using the terms "square" and "triangle" in the help would benefit newcomers.

dnbaker commented 4 years ago

This should be fixed in v0.4.2, and I'll add more descriptive language to output formats for future releases.

Thanks again!

We actually do support upper-triangular PHYLIP, but not full square matrix. That's probably worth doing.