dotnet / corefxlab

This repo is for experimentation and exploring new ideas that may or may not make it into the main corefx repo.
MIT License
1.46k stars 345 forks source link

priority queue spec should define decrease-key or some other priority update functionality #1841

Closed TimLovellSmith closed 4 years ago

TimLovellSmith commented 7 years ago

Re: https://github.com/dotnet/corefxlab/edit/master/docs/specs/priority-queue.md

The spec currently leaves as an open questions whether to support updating keys in PriorityQueue class. To close on this question... Yes! Updating priorities, especially decreasing them is important!

It is a generally useful operation, and it is possible to optimize special cases (mainly decrease key) enough, that it should be a first-class operation instead of always 'implemented' by the end user using remove + push.

If this is a question on whether to have it on the interface, why not? Some implementations of the interface might be no more efficient than calling remove element and then push element, but other implementations can be as efficient as possible.

It requires some thought to decide what the interface for collections where the elements contain the priority, or the queue works via IComparable<T> should be. Ideally (we can dream!) the collection would guarantee that the priority on the element is always consistent with the state of the priority queue. But a practical problem exists that the user can update his/her objects to be a new state but forget to notify the queue that priority has changed.

/cc @derekbjohnson /cc @pgolebiowski /cc @ebickle

pgolebiowski commented 7 years ago

Please go through the frightfully long discussion over here: https://github.com/dotnet/corefx/issues/574. You'll see voices for and against. I've been fighting for this feature for a long time. Feel free to contribute to the discussion (better post your thoughts here rather than in that long thread though [just reference it]). 😄 @karelz

TimLovellSmith commented 7 years ago

Recapping, I think some key contentious points related update are:

-worry about how update should work when there are duplicate elements (which we think sometimes we want to support). Is updateFirst weird? -somewhat unclear where the argument for/against supporting duplicates landed in the first place -worry that it might be inefficient to have an additional lookup data structure for finding elements in the backing data structure, or give handles out, just to update priority of items. (Especially for scenarios where update is never performed!)

Now as a result, here is my take.

Look at the popular applications of PriorityQueues courtesy of Wikipedia.

Q. Do any of them require duplicate elements, in the sense that the 'element' stored in the queue can be identical, but stored with a different priority? (Note, that this is only possible if the priority is stored externally to the item in the queue.)

Bandwidth management: no, you could either be queuing at the packet level or the connection level, but they don't need to be duplicated.

Djikstra's algorithm, and other 'best-first' algorithms: no, each vertex only needs to appear once (and you would also be happy with 'deduplicate insert/update')

Prim's algorithm: no, each vertex only needs to appear once (and you would also be happy with deduplicate insert/update'

Huffman coding: no, each vertex is a distinct letter in the alphabet.

ROAM triangulation: no, each triangle in the queue is a distinct region

Event simulation/scheduling: maybe - you might do this because you are trying to enqueue a repeated event (happening at multiple times), or to save space for many similar events by using the same representation ('light weight' event).

And as a special case of that, where we shouldn't need duplicates, we might care about timer queues for implementing .net timer scheduling (with cancellation)....

Where am I going with this... -I think the less common use case is for elements to be repeated but have distinct priorities (equal according to some equality comparison that doesn't know the priorities) . In those use cases, it makes no sense that anyone cares to update priorities of a specific one of those things, because:

But, you could still claim you wish to update a specific one of the repeated items because you want to update one which currently has a specific priority! (There is a question of whether you allow repeated priorities, but likely you do, and you are fine with just updating one of the items which has that priority...)

What if we divide all use cases into these categories:

  1. UNIQUE items in the set. Priority is potentially mutable. When priority changes, there is only definitely one item whose priority should be updated, which must then be efficiently located in the queue by some augmenting lookup.
  2. DUPLICATE items in the set, with UNIQUE priorities. Priority must be 'attached' to those duplicates as they are inserted Can be actually be seen as a special case of 1. UNIQUE items in the set, where those items are the 'duplicated item, and the priority of the item'
  3. DUPLICATE items in the set, with potentially DUPLICATE priorities. Priority must be 'attached' to those duplicates as they are inserted. In this case, the attached priority can be used to update a specific copy of an item, this might require a 'queue and hash of queues' implementation.

Actually if you want fully duplicate capability with or without duplicate priorities it is also not too hard to construct it, from the unique priority queue - e.g. have a priority queue of subpriority queues, where each subqueue is the N priorities of N repetitions of the items which are equivalent to each other. (Do you really want to store all those duplicates too??? It might be important to someone, ugh! If only we could leave such messy implementation aspects/optimiztions up to the end user somehow...)

So conclusions - what should really be done? 1) PriorityQueue supporting unique items, and decrease-key is actually the MUST have. (We might instead call it 'PrioritySet' instead of priority queue to indicate that items must be unique. But I think everyone expects this class to be named PriorityQueue, that is what they're going to type and try and resolve with intellisense, so never mind that idea!) 2) PriorityQueue supporting duplicate items with priorities distinct or not, is a NICE to have, but not a MUST have, and should therefore be aggressively CUT/POSTPONED into a separate discussion... (IMHO) for everyone who really wants that feature to further design how it should work!

Note: Supporting Increase-Key along with Decrease-Key is also nice to have for usability, even if its not as efficient as decrease-key. (I assume we are talking about lowest priority number wins...)

/cc @pgolebiowski @karelz @benaadams @ebickle

TimLovellSmith commented 4 years ago

We can close this for now as dupe of https://github.com/dotnet/runtime/issues/14032 where progress seems to be being made.