MKLab-ITI / multimedia-indexing

A framework for large-scale feature extraction, indexing and retrieval.
Apache License 2.0
59 stars 19 forks source link

Google Guava MinMaxPriorityQueue is a faster bounded priority queue with excellent Javadoc #3

Closed futurely closed 9 years ago

futurely commented 9 years ago

SPARK-1321 Use Guava's top k implementation rather than our BoundedPriorityQueue based implementation "On my simple test of sorting 100 million (Int, Int) tuples using Spark, Guava's top k implementation (in Ordering) is much faster than the BoundedPriorityQueue implementation for roughly sorted input (10 - 20X faster), and still faster for purely random input (2 - 5X)."

Selecting top k items from a list efficiently in Java / Groovy PriorityQueue: 300ms Guava Ordering: 170ms

Google Guava MinMaxPriorityQueue "A min-max priority queue can be configured with a maximum size. If so, each time the size of the queue exceeds that value, the queue automatically removes its greatest element according to its comparator (which might be the element that was just added). This is different from conventional bounded queues, which either block or reject new elements when full."

lefman commented 9 years ago

Thanks for your suggestion! If our tests confirm the speed gains that you mention we will consider shifting to Guava very soon!

lefman commented 9 years ago

We use LingPipe's implementation of BoundedPriorityQueue (not Spark's). My tests comparing the currently used implementation to Guava's MinMaxPriorityQueue showed that they have a similar efficiency. Therefore, it seems to be no reason for shifting to Guava at the moment.