-
Add a function that dedupes the repeated marble labels. This is like a reduce function that sums the duplicate weights. You can't really merge marbles, but lets pretend that they are play-doh balls. T…
cegme updated
11 years ago
-
- Check how to improve elasticsearch's performance
- Build a pre-indexer that filters out data that has been indexed for a given column. Basically this requires a count-min sketch per column, so that …
-
As documented in https://github.com/dib-lab/khmer/blob/doc/binaryformats/doc/dev/binary-file-formats.rst we have too many names for the same things:
1. countgraph/countinghash/count-min sketch with fi…
-
@idreeskhan pointed me to Space Saving and other variants, approximate algorithms that can answer top K items & frequencies. Could be nice to have.
Right now we have Count-Min Sketch from Algebird …
-
### A note for the community
* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to …
-
There are dependency updates like the rand one from dependabot not merged. It looks like a useful crate but I'm trying to assess whether I'd need to fork it or whether it is still maintained.
-
It would be great if we could add a runtime metric which will output skewed keys of joins.
Maybe we could use count min sketch or related datastructures in the lookup join operator itself to detect…
-
For small count-min sketches, you create CMSItem:
https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/CountMinSketch.scala#L467
But to get the frequenc…
-
## Problem Statement
Add support for data skipping indexes.
## Background and Motivation
Hyperspace has been supporting hash-partitioned covering indexes only. Covering indexes are good for s…
-
## Enhancement
Currently, after we import data to the cluster, we need to analyze the table, which is time-consuming since it needs to scan the whole table. Collecting table statistics can be done …