dnbaker / dashing2

Dashing 2 is a fast toolkit for k-mer and minimizer encoding, sketching, comparison, and indexing.
MIT License
62 stars 7 forks source link

--parse-by-seq bug fix + orderminhash bug fix #83

Closed dnbaker closed 1 year ago

dnbaker commented 1 year ago
  1. Fix the --parse-by-seq code.
  2. OrderMinHash bug fix from updating to sketch v0.19.1
  3. Throw an error on empty sequences.
  4. Improved handling of ram or memory sequences.

Since --parse-by-seq only needs sequences for edit distance calculation, we can free memory if running in --seqs-in-ram mode. Saves the trouble of caching the parsed sequences to disk, but requires more memory.

Lifetime management needed a bit of extra work, but it seems to be stable for both cases.