dnbaker / dashing2

Dashing 2 is a fast toolkit for k-mer and minimizer encoding, sketching, comparison, and indexing.
MIT License
62 stars 7 forks source link

--mash-distance outputs similarity with --set option #45

Open dejsha opened 2 years ago

dejsha commented 2 years ago

Hi, with version 2.1.9 binaries, I keep getting similarity matrix instead of distance matrix when using: ./dashing2_s128 sketch --cmpout distance.txt -F paths.txt -p16 -k31 --set --mash-distance I.e. the distance.txt output is identical (with ones at diagonal) as for similariry.txt produced with: ./dashing2_s128 sketch --cmpout similariry.txt -F paths.txt -p16 -k31 --set If I do weighted distance: ./dashing2_s128 sketch --cmpout distance.txt -F paths.txt -p16 -k31 --prob --mash-distance It seems works correctly - I recieve distance matrix (which has -0 at diagonal). How could I get mash distance with --set? Thank you very much, dasa

dnbaker commented 2 years ago

Hi dejsha,

Thanks for your bug report! You're right - Dashing2 is ignoring the distance metric for the --set option. This is also the case for weighted Jaccard (--countdict).

I'll get this patched up and let you know.

Best,

Daniel

dnbaker commented 2 years ago

Hi,

I've found and corrected the bug. I've merged it into main, and I'll let you know when the new binary releases are available.

Thanks,

Daniel

dnbaker commented 2 years ago

New binaries are out -

You can fetch them from https://github.com/dnbaker/dashing2-binaries or from the tarball release https://github.com/dnbaker/dashing2-binaries/archive/refs/tags/v2.1.10.tar.gz.

Thanks again,

Daniel