This PR changes the sketching code to use manysketch instead of sourmash sketch for metagenomes - which will hopefully be (much) faster for annoyingly large metagenomes.
While simple in concept, this necessitates a lot of extra machinery 😅 :
individual data files need to be sketched first
then, these data files are combined
which OK sounds simple but involves quite a few extra steps in practice!
We also introduce a diagnostic computation that shows datafile membership in the final sketches, as a confirmation.
This PR changes the sketching code to use
manysketch
instead ofsourmash sketch
for metagenomes - which will hopefully be (much) faster for annoyingly large metagenomes.While simple in concept, this necessitates a lot of extra machinery 😅 :
which OK sounds simple but involves quite a few extra steps in practice!
We also introduce a diagnostic computation that shows datafile membership in the final sketches, as a confirmation.
Fixes https://github.com/dib-lab/sourmash-slainte/issues/7