barrust / count-min-sketch

Count-Min Sketch Implementation in C
MIT License
45 stars 16 forks source link

Addition of CMS Merge and Minor Fixes #12

Closed jonahharris closed 4 years ago

jonahharris commented 4 years ago

This adds a cms_merge function that allows for merging multiple cms together and has a few minor fixes.

jonahharris commented 4 years ago

No reason in particular. That’s how most implementations seem to do it, even though, personally, I’d prefer to have the option to do as you say. I’ll update it to determine whether the target already exists and, if so, will merge into it rather than allocate a new one.

barrust commented 4 years ago

It might be easier to have two versions: one that puts everything into the first CMS and an alternate that makes a new one.

jonahharris commented 4 years ago

Thoughts on the following:

/*
 * Initialize the count-min sketch based on the width, depth, error rate, and
 * confidence values defined in the sketches to be merged, then merge them into
 * it.
 */
int cms_merge(CountMinSketch* cms, int num_sketches, ...);

/* Merge the following sketches into an already-initialized count-min sketch. */
int cms_merge_alt(CountMinSketch* cms, int num_sketches, ...);
// ALTERNATIVE: int cms_merge_into(CountMinSketch* cms, int num_sketches, ...);
barrust commented 4 years ago

I like the cms_merge_into option. Thanks!