Quickselect should only work on sets (lists of unique items) - Githubissues

buruzaemon / frequent-algorithm

Ruby implementation of FREQUENT algorithm from Identifying Frequent Items in Sliding Windows over On-Line Packet Streams, 2003

MIT License

6 stars 2 forks source link

Quickselect should only work on sets (lists of unique items) #33

Closed buruzaemon closed 9 years ago

buruzaemon commented 9 years ago

For the calculation of kth largest item to be absolutely correct, we need to fix our implementation.

Originally, it was purposefully implemented to work on lists where items could repeat, but we should really follow the description of Ch. 9 in Introduction to Algorithms.

Input: A set A of n (distinct) numbers and a number k, with 1 ≤ k ≤ n Output: The element x ∈ A that is larger than exactly k - 1 other elements of A

buruzaemon commented 9 years ago

This algorithm does not assume that the argument list is sorted.

Sorting the list ahead of time defeats the purpose of this algorithm. Best-case performance of a merge or heapsort is O(n lg n), while quickselect should be O(n).

sytong commented 9 years ago

Where can I find Chapter 9, I wonder?

buruzaemon commented 9 years ago

The book normally can't be found for free, but here are some notes from a uni in NY.

https://www.cs.rochester.edu/~gildea/csc282/slides/C09-median.pdf

buruzaemon commented 9 years ago

Another possibly good source of references would be this question on SO.

buruzaemon commented 9 years ago

The only part to fix is the case where k exceeds the length of the 'uniquifed' list.

In such a case, we will opt to assign to k the length of the 'uniquifed' list.

sytong commented 9 years ago

I think calling Hash#values to get the list of items first is probably not recommended too as that requires going through every items in the summary first?

buruzaemon commented 9 years ago

No matter which data structure we choose, we will have to gather up the counts for each key, regardless. And recall that a Hash guarantees keys that are non-nil and unique (the keys form a set), plus access to the count for a given key is O(1).

buruzaemon commented 9 years ago

Fix checked in and merged.