dib-lab / 2020-paper-sourmash-gather

Here we describe an extension of MinHash that permits accurate compositional analysis of metagenomes with low memory and disk requirements.
https://dib-lab.github.io/2020-paper-sourmash-gather
Other
8 stars 1 forks source link

should we benchmark containment rather than similarity? #2

Closed ctb closed 3 years ago

ctb commented 3 years ago

In Results section Scaled MinHash sketches support efficient indexing for large-scale containment queries, tbl:search-runtime shows runtime for similarity search. Two thoughts --

first, these are surprisingly slow :(. second, these are for similarity, not containment.

My experience with containment and gather (which uses containment) is that these are pretty fast operations; I rather rarely use similarity. Moreover, the whole paper is more focused on containment than similarity anyway.

Should we refocus this benchmark on containment?

ctb commented 3 years ago

yes, I think we should. :)

ctb commented 3 years ago

given the stuff going on with greyhound, we are going to ignore performance in this paper (beyond implying that it's acceptable, 'cause here are the results).

ctb commented 3 years ago

(and in fact we are removing that entire section as part of shift to #10)