dib-lab / 2020-paper-sourmash-gather

Here we describe an extension of MinHash that permits accurate compositional analysis of metagenomes with low memory and disk requirements.
https://dib-lab.github.io/2020-paper-sourmash-gather
Other
8 stars 1 forks source link

collected items for another paper (on magsearch/etc) or papers #17

Open ctb opened 3 years ago

ctb commented 3 years ago

let's track interesting tidbits for brainstorming...

practical plusses of sourmash impl --

from https://github.com/dib-lab/2020-paper-sourmash-gather/issues/10#issuecomment-729754458 - trying this out in #12. the greyhound stuff (https://github.com/dib-lab/sourmash/issues/1226) makes me think that we're going to be doing fast database search in a variety of ways and that databases don't really fit in this paper, b/c it's an implementation detail.

stable hashing system - ref https://github.com/dib-lab/2020-paper-sourmash-gather/issues/6

abundance, maybe - https://github.com/dib-lab/2020-paper-sourmash-gather/issues/4

ctb commented 3 years ago

Maybe TODO: discuss here how abundance tracking in MinHash is not "correct",
because it is not a proper weighted subsample of the data?
Note that Scaled MinHash is a proper weighted subsample.