dnbaker / dashing2

Dashing 2 is a fast toolkit for k-mer and minimizer encoding, sketching, comparison, and indexing.
MIT License
62 stars 7 forks source link

Cleanup pt2 #70

Closed dnbaker closed 1 year ago

dnbaker commented 1 year ago

Another bug fix: when using -o during sketching in separate sketch/cmp mode, we were erasing the previously existing sketches before.

This fixes that bug, so you can call sketch:

dashing2 sketch -o packed.sketch -F smallbac.list -p10

And compare:

dashing2 cmp --cmpout packed.dist --presketched packed.sketch -p10

We hadn't tested this code path.

We also add python code for creating this packed.sketch from individual sketch files in python/parse.py under convert_sketches_to_packed_sketch, and code for parsing individual sketch files as parse_binary_sketch.