Closed futurely closed 9 years ago
Thanks for your suggestion! If our tests confirm the speed gains that you mention we will consider shifting to Guava very soon!
We use LingPipe's implementation of BoundedPriorityQueue (not Spark's). My tests comparing the currently used implementation to Guava's MinMaxPriorityQueue showed that they have a similar efficiency. Therefore, it seems to be no reason for shifting to Guava at the moment.
SPARK-1321 Use Guava's top k implementation rather than our BoundedPriorityQueue based implementation "On my simple test of sorting 100 million (Int, Int) tuples using Spark, Guava's top k implementation (in Ordering) is much faster than the BoundedPriorityQueue implementation for roughly sorted input (10 - 20X faster), and still faster for purely random input (2 - 5X)."
Selecting top k items from a list efficiently in Java / Groovy PriorityQueue: 300ms Guava Ordering: 170ms
Google Guava MinMaxPriorityQueue "A min-max priority queue can be configured with a maximum size. If so, each time the size of the queue exceeds that value, the queue automatically removes its greatest element according to its comparator (which might be the element that was just added). This is different from conventional bounded queues, which either block or reject new elements when full."