dnbaker / dashing

Fast and accurate genomic distances using HyperLogLog
GNU General Public License v3.0
160 stars 11 forks source link

Option to produce PHYLIP distance matrix format #8

Closed kloetzl closed 5 years ago

kloetzl commented 5 years ago

dashing dist produces a distance matrix in its own new file format that is incompatible with any downstream analysis tool. Most of them expect a matrix in PHYLIP format. The conversion can be done with a simple awk script but is tedious. It would be great if dashing could produce a phylip distance matrix for improved user experience.

See also https://github.com/marbl/Mash/issues/9.

dnbaker commented 5 years ago

Thank you for bringing it to my attention. I had assumed that an upper-triangular tabular format would be a reasonable starting-point, but this is a reasonable request. I'd certainly prefer native support over relying on accessory scripts.

Would the relaxed PHYLIP format be sufficient? I can imagine many cases where the 10 prefix characters per label are insufficient.

kloetzl commented 5 years ago

Don't bother with the 10 char limit; that is an ancient restriction.

dnbaker commented 5 years ago

I have a draft here. Here is sample output for 78 Protozoan genomes:

78
GCF_000142945.1_ASM14294v1_genomic.fna.gz 0.007199 0.000000 0.027642 0.000000 0.000000 0.000000 0.019540 0.013258 0.000000 0.000000 0.000000 0.000000 0.044545 0.052319 0.000000 0.000000 0.017694 0.012010 0.000000 0.007814 0.032753 0.000000 0.015016 0.014479 0.000000 0.029771 0.030653 0.015432 0.015981 0.014502 0.000000 0.035712 0.002881 0.000000 0.000000 0.024731 0.000000 0.002010 0.003618 0.006096 0.000000 0.004518 0.011628 0.000000 0.005255 0.000000 0.023640 0.000000 0.000000 0.000000 0.004388 0.002179 0.000000 0.008760 0.010045 0.008401 0.000000 0.006602 0.000000 0.003102 0.000000 0.000000 0.019644 0.000000 0.000000 0.018096 0.010213 0.000000 0.008387 0.000000 0.000000 0.006217 0.008097 0.000000 0.014120 0.013482 0.016450
GCF_000002825.2_ASM282v1_genomic.fna.gz 0.000000 0.000000 0.000000 0.000000 0.002768 0.000000 0.033772 0.016129 0.001242 0.013156 0.003247 0.000000 0.018715 0.000000 0.027379 0.010211 0.000000 0.000000 0.015451 0.000000 0.000000 0.010505 0.003602 0.000000 0.002173 0.000000 0.011137 0.000000 0.000000 0.018399 0.000000 0.046333 0.000000 0.000000 0.003170 0.000000 0.000000 0.000000 0.000000 0.000000 0.043054 0.018911 0.000000 0.000000 0.000000 0.014779 0.000000 0.000000 0.000000 0.038376 0.000000 0.000000 0.018459 0.000752 0.019168 0.000000 0.000000 0.003338 0.000000 0.000000 0.004991 0.015341 0.000000 0.000000 0.015417 0.026875 0.000000 0.014660 0.000000 0.000000 0.006652 0.000000 0.016980 0.011126 0.011340 0.000000
GCF_000372725.1_Emiliana_huxleyi_CCMP1516_main_genome_assembly_v1.0_genomic.fna.gz 0.000000 0.032371 0.000000 0.000000 0.013889 0.000000 0.000000 0.000000 0.000668 0.031311 0.012377 0.000000 0.000000 0.010827 0.007934 0.000000 0.000000 0.000000 0.000000 0.033037 0.000000 0.000000 0.000000 0.003643 0.000000 0.000000 0.005421 0.005328 0.000000 0.004549 0.038374 0.019690 0.002102 0.016128 0.001169 0.000000 0.000000 0.000000 0.000000 0.000000 0.019588 0.000000 0.000000 0.007604 0.000000 0.000000 0.000000 0.041818 0.000000 0.008839 0.012932 0.003214 0.037156 0.000000 0.023061 0.003363 0.000000 0.000000 0.000000 0.028884 0.000211 0.023699 0.000000 0.010386 0.000000 0.000000 0.000000 0.000000 0.006634 0.005122 0.009684 0.000000 0.000000 0.000000 0.002992
GCF_000189635.1_JCVI-TTA1-2.2_genomic.fna.gz 0.000000 0.000000 0.006504 0.000000 0.000000 0.000000 0.000000 0.011729 0.000000 0.020068 0.006481 0.000000 0.010570 0.031820 0.000000 0.000000 0.051455 0.000000 0.006886 0.000000 0.014539 0.000000 0.000000 0.031778 0.002322 0.008616 0.011237 0.000000 0.001023 0.007395 0.006045 0.000000 0.016481 0.009016 0.000000 0.006801 0.000000 0.000000 0.008813 0.015334 0.007000 0.011712 0.019036 0.003870 0.000000 0.012049 0.000000 0.002426 0.034924 0.008658 0.000000 0.000000 0.000000 0.003629 0.029223 0.000000 0.000000 0.000989 0.000000 0.000000 0.011075 0.000000 0.014864 0.003202 0.000000 0.000000 0.000087 0.000000 0.013686 0.018100 0.000000 0.000000 0.003941 0.010114
GCF_000006405.1_JCVI_PMG_1.0_genomic.fna.gz 0.000000 0.011999 0.042323 0.000000 0.007212 0.005320 0.000000 0.018844 0.011728 0.002786 0.000000 0.008565 0.029624 0.000000 0.000000 0.021145 0.000000 0.005501 0.009148 0.052909 0.003628 0.050128 0.008327 0.024584 0.000767 0.005082 0.018459 0.000000 0.054306 0.001723 0.000000 0.020534 0.022187 0.000000 0.000000 0.007521 0.000000 0.000000 0.037395 0.000000 0.001892 0.009794 0.000000 0.000000 0.006018 0.017909 0.000000 0.000000 0.000000 0.009133 0.041122 0.000000 0.007996 0.005940 0.004024 0.005056 0.000000 0.028880 0.000000 0.000000 0.000000 0.004025 0.008995 0.010779 0.000000 0.019325 0.000127 0.018082 0.000000 0.000000 0.000000 0.011619 0.023732
GCF_000209065.1_ASM20906v1_genomic.fna.gz 0.000000 0.000000 0.014012 0.000000 0.000000 0.000000 0.020593 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.004346 0.000000 0.004696 0.000000 0.000000 0.000000 0.031650 0.006101 0.000000 0.014978 0.000000 0.001331 0.000000 0.000000 0.000000 0.000000 0.003490 0.006378 0.000033 0.000000 0.000000 0.000000 0.000000 0.010197 0.001999 0.000000 0.024930 0.000000 0.000000 0.000000 0.005643 0.000506 0.014223 0.000000 0.005709 0.031183 0.000000 0.008237 0.007130 0.000000 0.000000 0.000000 0.000000 0.028044 0.000000 0.000017 0.018721 0.000000 0.007779 0.000000 0.021617 0.000000 0.026385 0.000000 0.000000 0.000000 0.000000 0.014831
GCF_000315625.1_Guith1_genomic.fna.gz 0.000066 0.027191 0.018133 0.000000 0.008460 0.000000 0.038350 0.000000 0.000000 0.000543 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.015301 0.002284 0.003039 0.000000 0.000000 0.037052 0.000000 0.010455 0.000903 0.047845 0.000000 0.000000 0.025998 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.014248 0.001749 0.017146 0.008705 0.000000 0.000000 0.000000 0.013675 0.000000 0.003221 0.001052 0.000000 0.004241 0.000000 0.000000 0.000000 0.000000 0.000000 0.015210 0.019770 0.000000 0.012695 0.010041 0.004862 0.000000 0.000000 0.000000 0.005094 0.001606 0.000000 0.009913 0.000000 0.003385
GCF_000149755.1_P.sojae_V3.0_genomic.fna.gz 0.000000 0.000000 0.000000 0.000000 0.003527 0.033694 0.022338 0.000000 0.000000 0.012973 0.023095 0.001832 0.032633 0.000000 0.012005 0.026094 0.015081 0.015376 0.020956 0.018915 0.009096 0.000000 0.011687 0.000000 0.016040 0.015908 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.018390 0.000000 0.000000 0.000000 0.014946 0.000000 0.001340 0.022255 0.005436 0.031748 0.000000 0.003640 0.032311 0.000000 0.017692 0.000000 0.000000 0.027148 0.000000 0.000000 0.000000 0.000000 0.000000 0.027947 0.001199 0.000000 0.000000 0.004183 0.000000 0.020807 0.014178 0.000000 0.016606 0.011529 0.017854
GCF_000165425.1_ASM16542v1_genomic.fna.gz 0.016389 0.000000 0.000000 0.036566 0.000000 0.032444 0.002885 0.009592 0.007552 0.000000 0.000000 0.000000 0.000000 0.023951 0.000000 0.005871 0.000000 0.041841 0.015495 0.039324 0.000000 0.036141 0.027548 0.025901 0.014637 0.002806 0.027956 0.049537 0.020487 0.000000 0.006352 0.000000 0.007666 0.006296 0.009280 0.018420 0.007160 0.000000 0.006084 0.031031 0.010983 0.000854 0.002374 0.043097 0.000000 0.013321 0.039622 0.000000 0.000000 0.000000 0.000000 0.004337 0.000000 0.000000 0.000000 0.000040 0.000000 0.026669 0.000000 0.016200 0.000000 0.004766 0.003461 0.013273 0.002208 0.000000 0.002606 0.000000 0.023099
GCF_900000015.1_Plasmopara_halstedii_genome_genomic.fna.gz 0.003370 0.000000 0.000000 0.023398 0.000000 0.000000 0.010292 0.015105 0.000000 0.000000 0.000000 0.022656 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.016280 0.019828 0.024155 0.004452 0.000000 0.004065 0.000000 0.036064 0.000000 0.000000 0.015295 0.012057 0.011542 0.009593 0.000000 0.000000 0.000000 0.015082 0.011870 0.000000 0.031930 0.000000 0.027206 0.019190 0.000487 0.009566 0.010658 0.023459 0.014744 0.000000 0.000000 0.000000 0.005579 0.007368 0.000000 0.002089 0.000631 0.000000 0.000432 0.000000 0.000000 0.000741 0.003407 0.000000 0.017312 0.008918 0.000000 0.000000 0.023107 0.012211
GCF_000258005.1_HHA1_v02_genomic.fna.gz 0.041651 0.064497 0.021242 0.000000 0.012409 0.005808 0.000573 0.000000 0.000000 0.033378 0.002048 0.035016 0.017334 0.005636 0.011677 0.042000 0.000000 0.000393 0.000000 0.042453 0.025817 0.014051 0.022104 0.000000 0.041125 0.016448 0.003448 0.000507 0.000000 0.000000 0.000000 0.000000 0.014404 0.000000 0.000483 0.019156 0.011677 0.000000 0.000000 0.021373 0.000000 0.000000 0.005550 0.007472 0.019847 0.001399 0.019646 0.016687 0.003912 0.000000 0.000000 0.008804 0.013004 0.026516 0.000000 0.015838 0.002975 0.000000 0.000000 0.011607 0.000000 0.023672 0.014215 0.012387 0.000000 0.012076 0.016614
GCF_000006565.2_TGA4_genomic.fna.gz 0.000000 0.018058 0.024198 0.000000 0.000000 0.010016 0.000000 0.003158 0.001095 0.016129 0.000000 0.006716 0.000000 0.000000 0.040444 0.000000 0.000000 0.016062 0.000000 0.000950 0.000000 0.000000 0.000000 0.000000 0.000000 0.001447 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.001345 0.000000 0.006957 0.000000 0.000000 0.000000 0.000000 0.015504 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.002653 0.000000 0.000000 0.000000 0.000484
GCF_000520075.1_Apha_asta_APO3_V1_genomic.fna.gz 0.033689 0.000000 0.017760 0.013615 0.014956 0.000037 0.000000 0.000856 0.010833 0.015018 0.021173 0.001583 0.000000 0.044486 0.000000 0.000000 0.007785 0.032294 0.014990 0.000000 0.000000 0.019685 0.000000 0.000000 0.020345 0.000000 0.050129 0.025728 0.000000 0.015342 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.019843 0.027050 0.026723 0.000000 0.016504 0.000000 0.020637 0.000000 0.002878 0.000000 0.000000 0.000000 0.000000 0.000000 0.008837 0.022812 0.000000 0.000000 0.025072 0.000000 0.000000 0.000000 0.021933 0.026844 0.020684 0.000000 0.000000 0.038979 0.032313
GCF_000208865.1_ASM20886v2_genomic.fna.gz 0.005534 0.010737 0.000000 0.000000 0.029371 0.000000 0.041624 0.000000 0.027277 0.002092 0.023845 0.000000 0.011975 0.016038 0.022920 0.010390 0.000000 0.011077 0.031750 0.044568 0.000000 0.005535 0.039347 0.016976 0.000000 0.043082 0.000000 0.000000 0.036696 0.006398 0.020193 0.000000 0.022380 0.000000 0.017198 0.004389 0.000000 0.033109 0.027824 0.005082 0.017426 0.001686 0.004240 0.007837 0.031352 0.000000 0.000000 0.014718 0.000000 0.026948 0.012272 0.000000 0.012816 0.015553 0.017742 0.000000 0.016113 0.002923 0.024693 0.016812 0.000000 0.002230 0.008618 0.026218
GCF_000247585.1_PP_INRA-310_V2_genomic.fna.gz 0.000000 0.000000 0.001767 0.000000 0.000000 0.035151 0.012153 0.000000 0.000000 0.011652 0.000000 0.015275 0.022515 0.025013 0.000000 0.007330 0.002919 0.000000 0.010276 0.000000 0.013565 0.014456 0.000000 0.009861 0.020545 0.005527 0.001900 0.000000 0.033186 0.000000 0.024260 0.000000 0.005374 0.000000 0.000000 0.000000 0.035105 0.004467 0.000000 0.018665 0.005139 0.003986 0.021043 0.000000 0.000000 0.000234 0.000000 0.000871 0.014026 0.030694 0.000000 0.000000 0.000000 0.000000 0.000350 0.001462 0.000000 0.010673 0.000000 0.000000 0.005233 0.000000 0.035611
GCF_000499385.1_ENH001_genomic.fna.gz 0.000000 0.008498 0.000000 0.034311 0.000000 0.025309 0.000000 0.000000 0.023762 0.000000 0.009102 0.003126 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.014328 0.000000 0.000000 0.000000 0.001830 0.000000 0.000000 0.000000 0.014818 0.000000 0.000000 0.000000 0.013558 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.023429 0.000000 0.000000 0.044073 0.000635 0.010843 0.000000 0.000000 0.000000 0.029075 0.000717 0.000000 0.000000 0.017716 0.014697
GCF_000186865.1_v_1.0_genomic.fna.gz 0.032090 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.024690 0.023366 0.003583 0.000000 0.000000 0.000000 0.000000 0.021681 0.000000 0.024000 0.000000 0.000000 0.006849 0.000000 0.000000 0.016310 0.000000 0.000000 0.000000 0.006458 0.000000 0.000000 0.000000 0.017698 0.000000 0.000000 0.000000 0.015861 0.000000 0.012585 0.012814 0.016130 0.007133 0.000000 0.000000 0.002915 0.000000 0.029029 0.000000 0.022010 0.023985 0.000000 0.012995 0.024707 0.000000 0.000000 0.000000 0.000000 0.000000 0.023116 0.000000 0.012100 0.007045 0.000000
GCF_000499745.1_EMH001_genomic.fna.gz 0.001616 0.021711 0.015114 0.011300 0.014906 0.016546 0.026248 0.000000 0.007189 0.000000 0.006214 0.000000 0.005688 0.048039 0.005084 0.016063 0.012053 0.000000 0.008519 0.016952 0.015052 0.000000 0.000000 0.000000 0.012167 0.023075 0.001861 0.000000 0.000000 0.022070 0.000000 0.000000 0.000749 0.024299 0.035797 0.000000 0.000000 0.000000 0.000000 0.026029 0.002186 0.007169 0.000000 0.000000 0.000000 0.008458 0.000000 0.000000 0.013784 0.026095 0.013944 0.002150 0.009006 0.000000 0.029538 0.000000 0.000000 0.011162 0.021799 0.032940
GCF_000151545.1_ASM15154v2_genomic.fna.gz 0.000000 0.006884 0.000000 0.000000 0.000000 0.028709 0.000000 0.037196 0.000000 0.006781 0.000000 0.000000 0.013866 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.011770 0.028914 0.000607 0.000000 0.009097 0.000000 0.000000 0.000000 0.017400 0.000000 0.002983 0.000000 0.000000 0.000000 0.024240 0.000000 0.000000 0.000000 0.000600 0.000000 0.000000 0.006461 0.000000 0.018953 0.014254 0.000000 0.000000 0.010197 0.000000 0.000000 0.000000 0.000000 0.033711 0.000000 0.007110
GCF_000499545.2_ETH001_genomic.fna.gz 0.000000 0.000000 0.000000 0.000000 0.008834 0.015932 0.014569 0.000000 0.000000 0.013308 0.000000 0.005081 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.005173 0.027574 0.000000 0.000000 0.016247 0.000000 0.002217 0.000000 0.016211 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.004610 0.000836 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.003806 0.000248 0.014127 0.002023 0.000000 0.012496 0.005886 0.011456 0.008378 0.008773 0.000000 0.000000 0.000000 0.000000
GCF_000220395.1_JCVI-IMG1-V.1_genomic.fna.gz 0.000678 0.013251 0.016727 0.000000 0.000000 0.006972 0.046033 0.018026 0.000000 0.000000 0.000000 0.000000 0.019309 0.002933 0.017987 0.000000 0.000000 0.000000 0.000000 0.040346 0.000000 0.000000 0.014406 0.000000 0.005927 0.000000 0.000000 0.017892 0.000000 0.010745 0.002266 0.000000 0.000276 0.000000 0.041221 0.010123 0.017292 0.000000 0.000000 0.000000 0.008950 0.000000 0.031175 0.028549 0.000000 0.000000 0.000000 0.000000 0.000000 0.011996 0.000000 0.008505 0.012213 0.012587 0.014220 0.029311 0.008325
GCF_000769155.1_ASM76915v2_genomic.fna.gz 0.004396 0.000000 0.000000 0.000000 0.053705 0.000000 0.019902 0.001190 0.000000 0.000000 0.030609 0.020406 0.031928 0.000000 0.033450 0.000000 0.000000 0.000000 0.000000 0.000303 0.000000 0.032893 0.000000 0.001574 0.000000 0.024700 0.000000 0.000000 0.000000 0.000000 0.004856 0.011827 0.000000 0.005820 0.000000 0.012663 0.021765 0.000000 0.006993 0.000000 0.000000 0.000000 0.023146 0.000000 0.006420 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.001437 0.000000 0.000000 0.015711 0.021262
GCF_000499425.1_EAH001_genomic.fna.gz 0.000000 0.008770 0.000000 0.028618 0.021839 0.000000 0.000000 0.039816 0.007454 0.002321 0.000000 0.037917 0.000000 0.008773 0.000000 0.004051 0.000000 0.001396 0.000000 0.000000 0.063934 0.000000 0.017312 0.024296 0.000000 0.000000 0.000000 0.003328 0.035104 0.021214 0.000000 0.009542 0.000000 0.000000 0.009592 0.006515 0.000000 0.000000 0.024462 0.004177 0.000000 0.012800 0.000000 0.007986 0.005881 0.000000 0.000000 0.015769 0.006255 0.014363 0.037263 0.000000 0.008232 0.035218 0.017709
GCF_000499605.1_EMW001_genomic.fna.gz 0.008186 0.000000 0.048814 0.032275 0.011896 0.000000 0.000000 0.000000 0.010486 0.021453 0.000000 0.000000 0.005940 0.000000 0.000000 0.003656 0.002485 0.000000 0.000000 0.010667 0.024476 0.008037 0.000000 0.008970 0.009120 0.015377 0.009099 0.005333 0.037381 0.000000 0.000000 0.025443 0.000000 0.000000 0.028323 0.016366 0.014377 0.025532 0.000000 0.001605 0.000000 0.000982 0.018579 0.012706 0.015981 0.015424 0.017463 0.000659 0.001237 0.000478 0.000000 0.000000 0.019965 0.012531
GCF_000520115.1_Apha_inva_NJM9701_V1_genomic.fna.gz 0.000000 0.030550 0.015594 0.000000 0.002999 0.011807 0.031101 0.018869 0.023413 0.000000 0.000000 0.031419 0.022786 0.000000 0.002256 0.002599 0.000000 0.000000 0.051196 0.037003 0.003763 0.000000 0.000000 0.000000 0.010033 0.000000 0.016676 0.001553 0.000000 0.000000 0.000000 0.000000 0.011065 0.007307 0.000000 0.029037 0.000000 0.000000 0.021725 0.017002 0.000000 0.000000 0.016431 0.002520 0.000000 0.000000 0.017819 0.000000 0.011681 0.012884 0.007868 0.007562 0.017552
GCF_000281045.1_Sap_diclina_VS20_V1_genomic.fna.gz 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.003543 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.002706 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.014338 0.000000 0.000000 0.000000 0.001327 0.000000 0.000000 0.000000 0.000000 0.010478 0.000000 0.000000 0.033707 0.000000 0.000000 0.000000 0.000000 0.006017 0.002779 0.000000 0.015479 0.000000 0.007163
GCF_000330505.1_EIA2_v2_genomic.fna.gz 0.022694 0.012885 0.018197 0.044957 0.027986 0.006657 0.042711 0.000000 0.027469 0.027504 0.014817 0.002070 0.000000 0.019029 0.008181 0.041174 0.049533 0.038038 0.024732 0.021143 0.038520 0.000000 0.010215 0.004483 0.021191 0.031285 0.006605 0.020752 0.004179 0.001174 0.028165 0.000000 0.012803 0.030198 0.026545 0.000000 0.048104 0.043463 0.000000 0.009089 0.009727 0.002977 0.000149 0.010467 0.004191 0.000000 0.009516 0.000000 0.008663 0.000000 0.031006
GCF_000313135.1_Acastellanii.strNEFF_v1_genomic.fna.gz 0.017731 0.000000 0.007183 0.034882 0.032858 0.000515 0.012373 0.033118 0.022346 0.005600 0.008173 0.023408 0.000000 0.000000 0.001575 0.023724 0.000000 0.000000 0.007427 0.000000 0.000000 0.039725 0.010999 0.001194 0.017215 0.000000 0.021910 0.022297 0.000000 0.000000 0.028204 0.005884 0.007342 0.015380 0.000000 0.000000 0.000000 0.000000 0.003187 0.015553 0.009200 0.000000 0.000000 0.002249 0.006937 0.001094 0.000000 0.000000 0.016648 0.022651
GCF_000004985.1_V1.0_genomic.fna.gz 0.000000 0.000000 0.000253 0.000000 0.000000 0.000000 0.000000 0.000000 0.000887 0.021306 0.000000 0.000000 0.000000 0.015192 0.012509 0.017214 0.021189 0.000000 0.000000 0.000000 0.000000 0.000000 0.019745 0.011300 0.021575 0.010218 0.000000 0.002341 0.000000 0.000000 0.000000 0.000000 0.028123 0.000000 0.019712 0.012208 0.000000 0.000000 0.000000 0.000000 0.000000 0.029928 0.000000 0.000000 0.000000 0.004687 0.004011 0.000000 0.007474
GCF_000004825.1_PolPal_Dec2009_genomic.fna.gz 0.005250 0.000000 0.000000 0.003537 0.000000 0.011039 0.013460 0.013701 0.000000 0.000000 0.000000 0.000000 0.000000 0.014588 0.013767 0.000000 0.027834 0.026211 0.000000 0.018329 0.000000 0.000000 0.000000 0.000000 0.000000 0.026117 0.000000 0.000000 0.001853 0.000000 0.000000 0.000000 0.000000 0.009045 0.000000 0.000000 0.000000 0.029614 0.000000 0.000000 0.013183 0.000000 0.014373 0.000000 0.000000 0.000000 0.000000 0.009231
GCF_000149405.2_ASM14940v2_genomic.fna.gz 0.022495 0.000000 0.001952 0.006519 0.018818 0.000000 0.007658 0.000000 0.000000 0.041496 0.000000 0.005956 0.037541 0.000000 0.013099 0.000000 0.021812 0.000000 0.003525 0.021106 0.039702 0.000000 0.003463 0.015324 0.019812 0.000000 0.014829 0.000000 0.023930 0.031318 0.000000 0.000000 0.000771 0.023265 0.000000 0.033530 0.000000 0.037795 0.000000 0.007476 0.000000 0.000000 0.006073 0.000000 0.000000 0.006394 0.067167
GCF_000002725.2_ASM272v2_genomic.fna.gz 0.116464 0.027053 0.003251 0.000000 0.110706 0.011649 0.008972 0.000000 0.013303 0.000000 0.001665 0.011126 0.000000 0.000000 0.000000 0.000000 0.000000 0.012544 0.014390 0.010783 0.000000 0.000000 0.024690 0.008107 0.002882 0.000000 0.014559 0.007974 0.001377 0.000000 0.000000 0.000000 0.017288 0.000000 0.000000 0.035801 0.000000 0.000000 0.027251 0.026261 0.000477 0.019596 0.000000 0.038052 0.000000 0.026202
GCF_000002875.2_ASM287v2_genomic.fna.gz 0.000000 0.000000 0.000000 0.718429 0.000000 0.000000 0.000000 0.002873 0.000000 0.000000 0.036078 0.000000 0.005016 0.000000 0.000000 0.000000 0.000000 0.000000 0.014737 0.000000 0.000000 0.000000 0.000000 0.000000 0.003379 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.008856 0.000000 0.021043 0.000000 0.029441 0.000000 0.004300 0.009540 0.002073 0.008714 0.017420 0.000000
GCF_000234665.1_ASM23466v4_genomic.fna.gz 0.002946 0.025487 0.049154 0.017744 0.000000 0.012848 0.019331 0.000000 0.009413 0.002524 0.011932 0.000000 0.000000 0.024302 0.010662 0.011089 0.000000 0.008928 0.038755 0.000000 0.017655 0.000000 0.000000 0.007798 0.000000 0.005339 0.023030 0.000000 0.000000 0.000000 0.010579 0.000000 0.034557 0.020166 0.009620 0.000000 0.000000 0.023856 0.009605 0.002522 0.000000 0.000000 0.000000 0.007010
GCF_000190715.1_v1.0_genomic.fna.gz 0.000000 0.005060 0.000000 0.000000 0.000000 0.010024 0.000000 0.000000 0.013916 0.007192 0.011333 0.000000 0.000000 0.008646 0.020812 0.002425 0.000000 0.000000 0.000000 0.000000 0.036318 0.000000 0.000000 0.004551 0.000000 0.000000 0.000000 0.000000 0.009221 0.022679 0.000000 0.000831 0.001997 0.000000 0.000000 0.037312 0.000000 0.005922 0.017576 0.000000 0.000000 0.021218 0.000825
GCF_000002845.2_ASM284v2_genomic.fna.gz 0.022662 0.007534 0.000000 0.032994 0.004996 0.383321 0.011420 0.015757 0.009845 0.002630 0.012916 0.000000 0.024249 0.002370 0.000000 0.000000 0.000000 0.021993 0.019120 0.000000 0.012416 0.032072 0.012762 0.000000 0.000000 0.000000 0.000000 0.002304 0.034325 0.000000 0.024048 0.000000 0.000000 0.000000 0.037700 0.025117 0.018891 0.023610 0.000000 0.000000 0.000000 0.000000
GCF_000227135.1_ASM22713v2_genomic.fna.gz 0.009064 0.015535 0.000000 0.021501 0.000000 0.000000 0.051144 0.003910 0.000000 0.027777 0.019377 0.000000 0.010658 0.000000 0.028564 0.009873 0.031334 0.011190 0.007322 0.015703 0.000000 0.005086 0.012867 0.000000 0.016804 0.000000 0.000000 0.000000 0.000000 0.018786 0.007098 0.043168 0.000000 0.005535 0.000000 0.025036 0.020664 0.010629 0.020854 0.038587 0.000000
GCF_000004695.1_dicty_2.7_genomic.fna.gz 0.000000 0.000000 0.000000 0.007052 0.000000 0.020953 0.016650 0.000000 0.042659 0.000000 0.000000 0.000000 0.019650 0.000000 0.045264 0.000000 0.048937 0.021067 0.000000 0.015601 0.000000 0.000000 0.000000 0.009722 0.000000 0.008945 0.028087 0.000000 0.019863 0.007382 0.000000 0.000000 0.000000 0.003731 0.004494 0.001954 0.000000 0.000000 0.008559 0.002506
GCF_000787575.1_Asub_2.0_genomic.fna.gz 0.000000 0.000000 0.000000 0.000000 0.005067 0.000000 0.000000 0.013022 0.000000 0.000000 0.000000 0.000000 0.021375 0.009878 0.000000 0.000000 0.014744 0.000000 0.017788 0.014198 0.000000 0.000000 0.000000 0.000000 0.000000 0.000754 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.018306 0.000000 0.000000 0.000000 0.000000 0.023685
GCF_000240725.1_ASM24072v1_genomic.fna.gz 0.048741 0.017045 0.000000 0.000000 0.000000 0.000000 0.012039 0.000000 0.000000 0.035987 0.021919 0.015309 0.000000 0.000000 0.008889 0.028049 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.007062 0.000000 0.017092 0.012600 0.002460 0.000000 0.030798 0.010325 0.042319 0.016353 0.008451 0.000000 0.001880 0.019176
GCF_000203815.1_DFas_2.0_genomic.fna.gz 0.007759 0.000000 0.059907 0.001290 0.000000 0.000000 0.027563 0.000000 0.011192 0.025198 0.000000 0.000984 0.000000 0.000393 0.044830 0.027235 0.004343 0.000000 0.025159 0.015597 0.000000 0.000000 0.000000 0.009892 0.032263 0.008109 0.000000 0.018497 0.000000 0.011629 0.000000 0.023744 0.015031 0.000000 0.015983 0.004736 0.038256
GCF_000755165.1_ASM75516v1_genomic.fna.gz 0.020325 0.000000 0.004649 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.030896 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.003665 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.013266
GCF_001293395.1_ASM129339v1_genomic.fna.gz 0.000000 0.016448 0.016032 0.000000 0.008225 0.000000 0.000000 0.000000 0.018855 0.027356 0.000000 0.000000 0.000000 0.000000 0.019951 0.031426 0.004326 0.000000 0.000000 0.000000 0.024945 0.032554 0.000000 0.000000 0.019062 0.000000 0.000000 0.036670 0.000000 0.000000 0.000000 0.003835 0.008097 0.014359 0.000000
GCF_001680005.1_ASM168000v1_genomic.fna.gz 0.016944 0.008349 0.027755 0.035658 0.000000 0.028444 0.000000 0.000000 0.033414 0.052678 0.028715 0.025254 0.000000 0.001425 0.022822 0.000000 0.032240 0.029523 0.000000 0.020072 0.025041 0.000000 0.008828 0.000000 0.000000 0.000000 0.000000 0.000000 0.003671 0.008903 0.000000 0.004727 0.017725 0.031577
GCF_000142905.1_TheTra_May2010_genomic.fna.gz 0.000000 0.000000 0.000941 0.000000 0.032850 0.000000 0.014501 0.014457 0.000000 0.000000 0.000000 0.005736 0.020102 0.000000 0.000000 0.019943 0.000000 0.000000 0.000000 0.024840 0.000000 0.022608 0.042234 0.009070 0.012526 0.000000 0.000000 0.000000 0.000000 0.009308 0.021926 0.017400 0.019184
GCF_000209125.1_JCVI_EDISG_1.0_genomic.fna.gz 0.000000 0.032607 0.007695 0.020075 0.002839 0.033055 0.012923 0.004377 0.018090 0.003233 0.000000 0.081798 0.043142 0.000000 0.000000 0.011710 0.000000 0.035759 0.086870 0.000000 0.007222 0.000000 0.004917 0.019220 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.029046 0.018407
GCF_000002415.2_ASM241v2_genomic.fna.gz 0.000000 0.000000 0.038203 0.008304 0.005136 0.024168 0.000000 0.000000 0.016265 0.000000 0.035927 0.000000 0.000000 0.000000 0.001182 0.000000 0.000000 0.000000 0.000000 0.000000 0.001693 0.016178 0.000000 0.014661 0.000000 0.006404 0.000000 0.000000 0.000000 0.000000 0.000000
GCF_000524495.1_Plas_inui_San_Antonio_1_V1_genomic.fna.gz 0.017322 0.002334 0.018281 0.000000 0.019444 0.000000 0.007869 0.024339 0.002357 0.000000 0.000000 0.027120 0.039003 0.000000 0.000000 0.043646 0.027227 0.000000 0.000000 0.026453 0.011499 0.000000 0.036690 0.000000 0.004231 0.000000 0.000000 0.029594 0.000000 0.017035
GCF_000150955.2_ASM15095v2_genomic.fna.gz 0.000000 0.000000 0.007412 0.006323 0.000000 0.000000 0.004336 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.014181 0.000000 0.000000 0.000000 0.000000 0.012271 0.000000 0.015109 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.007074
GCF_000321355.1_PcynB_1.0_genomic.fna.gz 0.000000 0.000000 0.061371 0.022810 0.040318 0.015286 0.019407 0.023677 0.010399 0.000000 0.000000 0.000000 0.000000 0.007389 0.012569 0.000000 0.003923 0.014027 0.040143 0.044707 0.024698 0.000000 0.053734 0.022625 0.017192 0.000747 0.028110 0.013429
GCF_000002445.2_ASM244v1_genomic.fna.gz 0.000000 0.000000 0.000000 0.000000 0.506908 0.019980 0.014401 0.000000 0.002454 0.002193 0.031980 0.000000 0.000000 0.005621 0.000000 0.011411 0.000000 0.000000 0.000000 0.000000 0.021097 0.037402 0.006481 0.000000 0.000000 0.000000 0.022084
GCF_000956335.1_Plas_frag_nilgiri_V1_genomic.fna.gz 0.037622 0.000000 0.037326 0.030074 0.000000 0.000000 0.000000 0.000000 0.004832 0.030040 0.000000 0.000000 0.000000 0.000000 0.017426 0.023585 0.000000 0.000000 0.000000 0.004813 0.016625 0.000000 0.000000 0.000000 0.011833 0.027145
GCF_000006355.1_ASM635v1_genomic.fna.gz 0.000000 0.024180 0.006763 0.019500 0.009334 0.000000 0.000000 0.008753 0.009090 0.000000 0.009243 0.041546 0.000000 0.015050 0.006483 0.026199 0.017914 0.000000 0.000000 0.035657 0.009560 0.000000 0.023613 0.000000 0.048750
GCF_900002385.1_PY17X01_genomic.fna.gz 0.000000 0.007508 0.000000 0.032752 0.023893 0.000000 0.000000 0.026137 0.065283 0.024319 0.062221 0.000000 0.007083 0.000000 0.000000 0.000000 0.014676 0.000000 0.000000 0.004621 0.000000 0.012362 0.000133 0.018594
GCF_000002765.4_ASM276v2_genomic.fna.gz 0.000000 0.052721 0.024288 0.000000 0.000000 0.071605 0.207766 0.000000 0.000000 0.031628 0.000000 0.000000 0.019491 0.010699 0.000000 0.003709 0.016400 0.012494 0.000000 0.000000 0.005492 0.001285 0.006209
GCF_000210295.1_ASM21029v1_genomic.fna.gz 0.024496 0.020346 0.000000 0.007125 0.000000 0.007911 0.000000 0.000000 0.030138 0.000000 0.056772 0.000000 0.000000 0.000000 0.020366 0.024991 0.034121 0.004919 0.000000 0.000000 0.000000 0.029264
GCF_000691245.1_Tgr_V1_genomic.fna.gz 0.000000 0.000000 0.000000 0.000000 0.006797 0.000000 0.029093 0.000000 0.000000 0.000000 0.027658 0.000000 0.000000 0.000000 0.000000 0.024775 0.000000 0.000000 0.013838 0.000000 0.000000
GCF_000208925.1_JCVI_ESG2_1.0_genomic.fna.gz 0.000000 0.000000 0.016568 0.004738 0.000000 0.006248 0.295162 0.006918 0.028900 0.028419 0.020480 0.015459 0.033247 0.000000 0.016012 0.004944 0.000000 0.000000 0.040954 0.019946
GCF_900002335.2_PCHAS01_genomic.fna.gz 0.000000 0.027514 0.000000 0.000000 0.090945 0.022065 0.000000 0.015050 0.006128 0.000000 0.000000 0.011713 0.006864 0.000000 0.000000 0.000000 0.000000 0.000813 0.000000
GCF_000151665.1_ASM15166v1_genomic.fna.gz 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.001494 0.000000 0.000000 0.000000 0.002443 0.000000 0.000000 0.000000 0.000000 0.000000 0.000176
GCF_001602025.1_ASM160202v1_genomic.fna.gz 0.042891 0.000000 0.000000 0.010665 0.017959 0.000000 0.016108 0.041885 0.000000 0.000000 0.001302 0.028652 0.005731 0.000000 0.000000 0.005020 0.011928
GCF_001601855.1_ASM160185v1_genomic.fna.gz 0.000000 0.000000 0.008053 0.005627 0.000363 0.020650 0.000000 0.000000 0.024380 0.000000 0.050871 0.004629 0.000000 0.003466 0.000000 0.026506
GCF_900002375.1_PBANKA01_genomic.fna.gz 0.000000 0.000000 0.000000 0.000000 0.002340 0.000000 0.000000 0.000000 0.009744 0.000000 0.018041 0.000000 0.002056 0.000000 0.004084
GCF_000709005.1_Plas_vinc_vinckei_V1_genomic.fna.gz 0.000000 0.000000 0.015815 0.037866 0.000170 0.000000 0.015176 0.000000 0.002345 0.013354 0.000000 0.000000 0.006752 0.043082
GCF_000257125.1_ENU1_v1_genomic.fna.gz 0.000000 0.015920 0.012242 0.000000 0.039389 0.064021 0.000000 0.018527 0.000542 0.000000 0.000134 0.021222 0.000508
GCF_000223845.1_GNI3_genomic.fna.gz 0.000000 0.009897 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.026281 0.014464
GCF_000981445.1_Bbig001_genomic.fna.gz 0.016484 0.000000 0.009484 0.014886 0.008956 0.004362 0.020197 0.000000 0.018961 0.015640 0.026633
GCF_000743755.1_ASM74375v1_genomic.fna.gz 0.032451 0.017969 0.000000 0.011178 0.034817 0.000000 0.016890 0.036399 0.022064 0.017120
GCF_000342415.1_JCVI-bewag-v1.1_genomic.fna.gz 0.000000 0.008696 0.000000 0.020549 0.014769 0.000000 0.013454 0.000000 0.016461
GCF_000002435.1_GL2_genomic.fna.gz 0.000000 0.000000 0.007758 0.007316 0.011691 0.015130 0.000777 0.000000
GCF_000006515.1_JCVI_cmg_v1.0_genomic.fna.gz 0.018446 0.000000 0.019200 0.000000 0.011632 0.037399 0.031498
GCF_000165345.1_ASM16534v1_genomic.fna.gz 0.014004 0.318223 0.000000 0.000000 0.000000 0.000000
GCF_000740895.1_ASM74089v1_genomic.fna.gz 0.028390 0.016943 0.016546 0.063483 0.032780
GCF_000006425.1_ASM642v1_genomic.fna.gz 0.000000 0.037311 0.002652 0.019590
GCF_000165365.1_ASM16536v1_genomic.fna.gz 0.000000 0.000000 0.003961
GCF_000003225.3_ASM322v1_genomic.fna.gz 0.000000 0.000000
GCF_000165395.1_ASM16539v1_genomic.fna.gz 0.000000
GCF_000691945.2_ASM69194v2_genomic.fna.gz

I haven't merged it in yet; I may even make this the default output format. Does this seem to match the format? (Space separated and padded to at least 10 characters.)

kloetzl commented 5 years ago

Thank you for considering phylip distance matrix as an output format, however I have a few issues with your example.

dnbaker commented 5 years ago

Thank you! You make some good points, and I appreciate the feedback.

• It's more convenient for the software to produce upper triangular, as this allows us to free resources as program execution progresses. Lower triangular would also take a more significant rewrite, and we'll consider both it and a full distance matrix (without -s) for future releases. I imagine I'd prefer to pad all sequences with spaces to the same length or use tabs instead of spaces, but that wouldn't be compliant.

• I've been using the current format not for claiming significant digits, but representing the floating-point well, since downstream tools would be converting these back into machine representations. I'll have to consider what the right tradeoff is.

• It seems redundant when all files have this extension. However, file extensions are not assumed to be uniform, and trimming extensions seems difficult to do in a principled way.

• Thank you, the linked code is replaced with something clearer now. The requirement to pad to at least 10 characters including the trailing space, is a little awkward.

kloetzl commented 5 years ago

I see that you have your reasons to keep the output as it is. Well, that means I have to write a wrapper to integrate dashing into my pipeline. But already the new output format helps. Feel free to close on merge.

Thank you, the linked code is replaced with something clearer now.

Very much so!

dnbaker commented 5 years ago

I've merged this into master, but I'd still like to consider an option for full matrix PHYLIP output. I'd be tempted to do so by emitting as binary and then replacing that file with the desired human-readable form. This would add some processing time, but at least you wouldn't need to use your own script.