Closed buruzaemon closed 9 years ago
This algorithm does not assume that the argument list is sorted.
Sorting the list ahead of time defeats the purpose of this algorithm. Best-case performance of a merge or heapsort is O(n lg n), while quickselect should be O(n).
Where can I find Chapter 9, I wonder?
The book normally can't be found for free, but here are some notes from a uni in NY.
https://www.cs.rochester.edu/~gildea/csc282/slides/C09-median.pdf
Another possibly good source of references would be this question on SO.
The only part to fix is the case where k exceeds the length of the 'uniquifed' list.
In such a case, we will opt to assign to k the length of the 'uniquifed' list.
I think calling Hash#values to get the list of items first is probably not recommended too as that requires going through every items in the summary first?
No matter which data structure we choose, we will have to gather up the counts for each key, regardless. And recall that a Hash guarantees keys that are non-nil
and unique (the keys form a set), plus access to the count for a given key is O(1).
Fix checked in and merged.
For the calculation of kth largest item to be absolutely correct, we need to fix our implementation.
Originally, it was purposefully implemented to work on lists where items could repeat, but we should really follow the description of Ch. 9 in Introduction to Algorithms.
Input: A set A of n (distinct) numbers and a number k, with 1 ≤ k ≤ n Output: The element x ∈ A that is larger than exactly k - 1 other elements of A