-
On "real read" data sets (shewanella and podar) I have seen that fewer than 100% of the k-mers in the reads are contained in the cDBG (see for example https://github.com/spacegraphcats/spacegraphcats/…
-
I've been running frontier search with individual genome signatures against the entire podar data set for the purpose of extracting reads (using PR #139 code), and the majority of the frontier that it…
-
Dominik and I agree that we'd like to evaluate things equivalent to containment/similarity/overhead with respect to the k-mers themselves (not just the hashes).
That is, if we think of our process…
-
Is there a way when streaming sequences into sourmash to every now and then "emit" the signature, while the overall process continues? Like peeking at the result.
-
- [x] Only resize the query mh if we need to (and memoize it)
- [x] Something is wrong with resizing. Try `time python -m search.frontier_search mircea-rm18.0.fa.sig podar 0.1` to see failure.
-
...that would let us equilibrate `max_hash`.
Could also add a `downsample` option to comparison functions, or provide a special downsampling set of comparison functions.
-
This is using data where we have ground truth.
Relies on output sig code in #86.
You will need to have built both the `acido` and `15genome` data sets.
```
# take only the long contigs from …
-
Dear all,
What does the sbt_gather command do exactly? On Titus' blog the following appears in the [comments section](http://ivory.idyll.org/blog/2016-sourmash-sbt-more.html):
> Second, the asy…
-
Dear all,
Once a minhash signature is created using
```
import sourmash_lib as sm
e = sm.Estimators(n=50, ksize=15)
for seq in sample
e.add_sequence(seq)
```
how can it be exporte…