bmizerany / perks

Effective Computation of Things
BSD 2-Clause "Simplified" License
186 stars 53 forks source link

fix a degenerate case in topk #16

Open blinsay opened 9 years ago

blinsay commented 9 years ago

I found a degenerate case in topk, added a test for it, and fixed it.

The bug

Suppose you have the following item frequencies:

map[string]int{
        "a": 123,
        "d": 99,
        "e": 87,
        "b": 45,
        "c": 42
}

If the next 6 items in the stream are [c, c, c, c, c, c], the Stream will have a count of 48 for c and 45 for b, but min will still point to "c" when it should be pointing to "b".

At this point if you add a new item, "f", it will replace "c" instead of replacing "b" even though "c" has a higher count.

The fix

Keep a reference to the actual minimum Element at all times using container/heap.

(This is #3 with cleaner commits, but two years later. Sorry about the delay.)

bmizerany commented 1 year ago

@blinsay Has this patch worked for you over the last, almost, 8 years?

blinsay commented 1 year ago

I haven't used it in a long time. Feel free to close this out.