dib-lab / sourmash-slainte

Project template for sourmash-based characterization of genomes and metagenomes
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

MRG: use manysketch for sketching #15

Closed ctb closed 7 months ago

ctb commented 7 months ago

This PR changes the sketching code to use manysketch instead of sourmash sketch for metagenomes - which will hopefully be (much) faster for annoyingly large metagenomes.

While simple in concept, this necessitates a lot of extra machinery 😅 :

which OK sounds simple but involves quite a few extra steps in practice!

We also introduce a diagnostic computation that shows datafile membership in the final sketches, as a confirmation.

Fixes https://github.com/dib-lab/sourmash-slainte/issues/7