dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.07k stars 4.69k forks source link

Add PriorityQueue<T> to Collections #14032

Closed ebickle closed 3 years ago

ebickle commented 9 years ago

See LATEST Proposal in corefxlab repo.

Second Proposal options

Proposal from https://github.com/dotnet/corefx/issues/574#issuecomment-307971397

Assumptions

Elements in priority queue are unique. If they are not, we would have to introduce 'handles' of items to enable their update/remove. Or the update/remove semantics would have to apply to first/all, which is weird.

Modeled after Queue<T> (MSDN link)

API

public class PriorityQueue<TElement, TPriority>
    : IEnumerable,
    IEnumerable<(TElement element, TPriority priority)>,
    IReadOnlyCollection<(TElement element, TPriority priority)>
    // ICollection not included on purpose
{
    public PriorityQueue();
    public PriorityQueue(IComparer<TPriority> comparer);

    public IComparer<TPriority> Comparer { get; }
    public int Count { get; }
    public bool IsEmpty { get; }

    public bool Contains(TElement element);

    // Peek & Dequeue
    public (TElement element, TPriority priority) Peek(); // Throws if empty
    public (TElement element, TPriority priority) Dequeue(); // Throws if empty
    public bool TryPeek(out TElement element, out TPriority priority); // Returns false if empty
    public bool TryDequeue(out TElement element, out TPriority priority); // Returns false if empty

    // Enqueue & Update
    public void Enqueue(TElement element, TPriority priority); // Throws if it is duplicate
    public void Update(TElement element, TPriority priority); // Throws if element does not exist
    public void EnqueueOrUpdate(TElement element, TPriority priority);
    public bool TryEnqueue(TElement element, TPriority priority); // Returns false if it is duplicate (does NOT update it)
    public bool TryUpdate(TElement element, TPriority priority); // Returns false if element does not exist (does NOT add it)

    public void Remove(TElement element); // Throws if element does not exist
    public bool TryRemove(TElement element); // Returns false if element does not exist

    public void Clear();

    public IEnumerator<(TElement element, TPriority priority)> GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator();

//
// Selector part
//
    public PriorityQueue(Func<TElement, TPriority> prioritySelector);
    public PriorityQueue(Func<TElement, TPriority> prioritySelector, IComparer<TPriority> comparer);

    public Func<TElement, TPriority> PrioritySelector { get; }

    public void Enqueue(TElement element);
    public void Update(TElement element);
}

Open questions:

  1. Class name PriorityQueue vs. Heap
  2. Introduce IHeap and constructor overload? (Should we wait for later?)
  3. Introduce IPriorityQueue? (Should we wait for later - IDictionary example)
  4. Use selector (of priority stored inside the value) or not (5 APIs difference)
  5. Use tuples (TElement element, TPriority priority) vs. KeyValuePair<TPriority, TElement>
    • Should Peek and Dequeue rather have out argument instead of tuple?
  6. Is Peek and Dequeue throwing useful at all?

Original Proposal

Issue https://github.com/dotnet/corefx/issues/163 requested the addition of a priority queue to the core .NET collection data structures.

This post, while a duplicate, is intended to act the formal submission to the corefx API Review Process. The issue contents are the speclet for a new System.Collections.Generic.PriorityQueue type.

I will be contributing the PR, if approved.

Rationale and Usage

The .NET Base Class Libraries (BCL) currently lacks support for ordered producer-consumer collections. A common requirement of many software applications is the ability generate a list of items over time and process them in an order different from the order they were received in.

There are three generic data structures within the System.Collections hierarchy of namespaces that supported a sorted collection of items; System.Collections.Generic.SortedList, System.Collections.Generic.SortedSet, and System.Collections.Generic.SortedDictionary.

Of these, SortedSet and SortedDictionary are not appropriate for producer-consumer patterns that generate duplicate values. The complexity of SortedList is Θ(n) worst case for both Add and Remove.

A much more memory and time efficient data structure for ordered collections with producer-consumer usage patterns is a priority queue. Other than when capacity resizing is necessary, worse case insertion (enqueue) and remove top (dequeue) performance is Θ(log n) - far better than the existing options that exist in the BCL.

Priority queues have a wide degree of applicability across different classes of applications. The Wikipedia page on Priority Queues offers a list of many different well understand use cases. While highly specialized implementations may still require custom priority queue implementations, a standard implementation would cover a broad range of usage scenarios. Priority queues are particularly useful in scheduling the output of multiple producers, which is an important pattern in highly parallelized software.

It's worth noting that both the C++ standard library and Java offer priority queue functionality as part of their basic APIs.

Proposed API

namespace System.Collections.Generic
{
    /// <summary>
    /// Represents a collection of objects that are removed in a sorted order.
    /// </summary>
    /// <typeparam name="T">Specifies the type of elements in the queue.</typeparam>    
    [DebuggerDisplay("Count = {count}")]
    [DebuggerTypeProxy(typeof(System_PriorityQueueDebugView<>))]
    public class PriorityQueue<T> : IEnumerable<T>, ICollection, IEnumerable, IReadOnlyCollection<T>
    {
        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
        /// that uses a default comparer.
        /// </summary>
        public PriorityQueue();

        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
        /// that has the specified initial capacity.
        /// </summary>
        /// <param name="capacity">The initial number of elements that the <see cref="PriorityQueue{T}"/> can contain.</param>
        /// <exception cref="T:System.ArgumentOutOfRangeException"><paramref name="capacity"/> is less than zero.</exception>
        public PriorityQueue(int capacity);

        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
        /// that uses a specified comparer.
        /// </summary>
        /// <param name="comparer">The <see cref="T:System.Collections.Generic.IComparer{T}"/> to use when comparing elements.</param>
        /// <exception cref="T:System.ArgumentNullException"><paramref name="comparer"/> is null.</exception>
        public PriorityQueue(IComparer<T> comparer);

        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
        /// that contains elements copied from the specified collection and uses a default comparer.
        /// </summary>
        /// <param name="collection">The collection whose elements are copied to the new <see cref="PriorityQueue{T}"/>.</param>
        /// <exception cref="T:System.ArgumentNullException"><paramref name="collection"/> is null.</exception>
        public PriorityQueue(IEnumerable<T> collection);

        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class 
        /// that contains elements copied from the specified collection and uses a specified comparer.
        /// </summary>
        /// <param name="collection">The collection whose elements are copied to the new <see cref="PriorityQueue{T}"/>.</param>
        /// <param name="comparer">The <see cref="T:System.Collections.Generic.IComparer{T}"/> to use when comparing elements.</param>
        /// <exception cref="T:System.ArgumentNullException">
        /// <paramref name="collection"/> is null. -or-
        /// <paramref name="comparer"/> is null.
        /// </exception>
        public PriorityQueue(IEnumerable<T> collection, IComparer<T> comparer);

        /// <summary>
        /// Initializes a new instance of the <see cref="PriorityQueue{T}"/> class that is empty,
        /// has the specified initial capacity, and uses a specified comparer.
        /// </summary>
        /// <param name="capacity">The initial number of elements that the <see cref="PriorityQueue{T}"/> can contain.</param>
        /// <param name="comparer">The <see cref="T:System.Collections.Generic.IComparer{T}"/> to use when comparing elements.</param>
        /// <exception cref="T:System.ArgumentOutOfRangeException"><paramref name="capacity"/> is less than zero.</exception>
        /// <exception cref="T:System.ArgumentNullException"><paramref name="comparer"/> is null.</exception>
        public PriorityQueue(int capacity, IComparer<T> comparer);

        /// <summary>
        /// Gets the <see cref="IComparer{T}"/> for the <see cref="PriorityQueue{T}"/>. 
        /// </summary>
        /// <value>
        /// The <see cref="T:System.Collections.Generic.IComparer{T}"/> that is used when
        /// comparing elements in the <see cref="PriorityQueue{T}"/>. 
        /// </value>
        public IComparer<T> Comparer 
        { 
            get;
        }

        /// <summary>
        /// Gets the number of elements contained in the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        /// <value>The number of elements contained in the <see cref="PriorityQueue{T}"/>.</value>
        public int Count 
        { 
            get;
        }

        /// <summary>
        /// Adds an object to the into the <see cref="PriorityQueue{T}"/> by its priority.
        /// </summary>
        /// <param name="item">
        /// The object to add to the <see cref="PriorityQueue{T}"/>. 
        /// The value can be null for reference types.
        /// </param>
        public void Enqueue(T item);

        /// <summary>
        /// Removes and returns the object with the lowest priority in the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        /// <returns>The object with the lowest priority that is removed from the <see cref="PriorityQueue{T}"/>.</returns>
        /// <exception cref="InvalidOperationException">The <see cref="PriorityQueue{T}"/> is empty.</exception>
        public T Dequeue();

        /// <summary>
        /// Returns the object with the lowest priority in the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        /// <exception cref="InvalidOperationException">The <see cref="PriorityQueue{T}"/> is empty.</exception>
        public T Peek();

        /// <summary>
        /// Removes all elements from the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        public void Clear();

        /// <summary>
        /// Determines whether an element is in the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        /// <param name="item">
        /// The object to add to the end of the <see cref="PriorityQueue{T}"/>. 
        /// The value can be null for reference types.
        /// </param>
        /// <returns>
        /// true if item is found in the <see cref="PriorityQueue{T}"/>;  otherwise, false.
        /// </returns>
        public bool Contains(T item);

        /// <summary>
        /// Copies the elements of the <see cref="PriorityQueue{T}"/> to an  <see cref="T:System.Array"/>, 
        /// starting at a particular <see cref="T:System.Array"/> index.
        /// </summary>
        /// <param name="array">
        /// The one-dimensional <see cref="T:System.Array">Array</see> that is the
        /// destination of the elements copied from the <see cref="PriorityQueue{T}"/>. 
        /// The <see cref="T:System.Array">Array</see> must have zero-based indexing.
        /// </param>
        /// <param name="arrayIndex">The zero-based index in <paramref name="array"/> at which copying begins.</param>
        /// <exception cref="T:System.ArgumentNullException"><paramref name="array"/> is null.</exception>
        /// <exception cref="T:System.ArgumentOutOfRangeException">
        /// <paramref name="arrayIndex"/> is less than zero. -or- 
        /// <paramref name="arrayIndex"/> is equal to or greater than the length of the <paramref name="array"/>
        /// </exception>
        /// <exception cref="ArgumentException">
        /// The number of elements in the source <see cref="T:System.Collections.ICollection"/> is
        /// greater than the available space from <paramref name="index"/> to the end of the destination
        /// <paramref name="array"/>.
        /// </exception>
        public void CopyTo(T[] array, int arrayIndex);

        /// <summary>
        /// Copies the elements of the <see cref="T:System.Collections.ICollection"/> to an 
        /// <see cref="T:System.Array"/>, starting at a particular <see cref="T:System.Array"/> index.
        /// </summary>
        /// <param name="array">
        /// The one-dimensional <see cref="T:System.Array">Array</see> that is the
        /// destination of the elements copied from the <see cref="PriorityQueue{T}"/>. 
        /// The <see cref="T:System.Array">Array</see> must have zero-based indexing.
        /// </param>
        /// <param name="index">The zero-based index in <paramref name="array"/> at which copying begins.</param>
        /// <exception cref="T:System.ArgumentNullException"><paramref name="array"/> is null.</exception>
        /// <exception cref="T:System.ArgumentOutOfRangeException"><paramref name="index"/> is less than zero.</exception>
        /// <exception cref="ArgumentException">
        /// <paramref name="array"/> is multidimensional. -or-
        /// <paramref name="array"/> does not have zero-based indexing. -or-
        /// <paramref name="index"/> is equal to or greater than the length of the <paramref name="array"/> -or- 
        /// The number of elements in the source <see cref="T:System.Collections.ICollection"/> is
        /// greater than the available space from <paramref name="index"/> to the end of the destination
        /// <paramref name="array"/>. -or- 
        /// The type of the source <see cref="T:System.Collections.ICollection"/> cannot be cast automatically 
        /// to the type of the destination <paramref name="array"/>.
        /// </exception>
        void ICollection.CopyTo(Array array, int index);

        /// <summary>
        /// Copies the elements stored in the <see cref="PriorityQueue{T}"/> to a new array.
        /// </summary>
        /// <returns>
        /// A new array containing a snapshot of elements copied from the <see cref="PriorityQueue{T}"/>.
        /// </returns>
        public T[] ToArray();

        /// <summary>
        /// Returns an enumerator that iterates through the <see cref="PriorityQueue{T}"/>
        /// </summary>
        /// <returns>An enumerator for the contents of the <see cref="PriorityQueue{T}"/>.</returns>
        public Enumerator GetEnumerator();

        /// <summary>
        /// Returns an enumerator that iterates through the <see cref="PriorityQueue{T}"/>
        /// </summary>
        /// <returns>An enumerator for the contents of the <see cref="PriorityQueue{T}"/>.</returns>
        IEnumerator<T> IEnumerable<T>.GetEnumerator();

        /// <summary>
        /// Returns an enumerator that iterates through the <see cref="PriorityQueue{T}"/>.
        /// </summary>
        /// <returns>An <see cref="T:System.Collections.IEnumerator"/> that can be used to iterate through the collection.</returns>
        IEnumerator IEnumerable.GetEnumerator();

        /// <summary>
        /// Sets the capacity to the actual number of elements in the <see cref="PriorityQueue{T}"/>, 
        /// if that number is less than than a threshold value.
        /// </summary>
        public void TrimExcess();

        /// <summary>
        /// Gets a value that indicates whether access to the <see cref="ICollection"/> is 
        /// synchronized with the SyncRoot.
        /// </summary>
        /// <value>true if access to the <see cref="T:System.Collections.ICollection"/> is synchronized
        /// with the SyncRoot; otherwise, false. For <see cref="PriorityQueue{T}"/>, this property always
        /// returns false.</value>
        bool ICollection.IsSynchronized
        {
            get;
        }

        /// <summary>
        /// Gets an object that can be used to synchronize access to the 
        /// <see cref="T:System.Collections.ICollection"/>.
        /// </summary>
        /// <value>
        /// An object that can be used to synchronize access to the 
        /// <see cref="T:System.Collections.ICollection"/>.
        /// </value>
        object ICollection.SyncRoot
        {
            get;
        }

        public struct Enumerator : IEnumerator<T>
        {
            public T Current { get; }
            object IEnumerator.Current { get; }
            public bool MoveNext();
            public void Reset();
            public void Dispose();
        }
    }
}

Details

Operation Complexity Notes
Construct Θ(1)
Construct Using IEnumerable Θ(n)
Enqueue Θ(log n)
Dequeue Θ(log n)
Peek Θ(1)
Count Θ(1)
Clear Θ(N)
Contains Θ(N)
CopyTo Θ(N) Uses Array.Copy, actual complexity may be lower
ToArray Θ(N) Uses Array.Copy, actual complexity may be lower
GetEnumerator Θ(1)
Enumerator.MoveNext Θ(1)
svick commented 9 years ago
Operation Complexity
Construct Using IEnumerable Θ(log n)

I think this should be Θ(n). You need to at least iterate the input.

ghost commented 9 years ago

+1

anaisbetts commented 9 years ago

Rx has a highly production-tested priority queue class:

https://github.com/Reactive-Extensions/Rx.NET/blob/master/Rx.NET/Source/System.Reactive.Core/Reactive/Internal/PriorityQueue.cs

justinvp commented 9 years ago

Should a nested public enumerator structure be used to avoid an additional heap allocation during calls to GetEnumerator and when using foreach? My assumption is no, since enumerating over a queue is an uncommon operation.

I'd lean towards using the struct enumerator to be consistent with Queue<T> which uses a struct enumerator. Also, if a struct enumerator isn't used now, we won't be able to change PriorityQueue<T> to use one in the future.

benaadams commented 9 years ago

Also perhaps a method for batch inserts? Can always then sort and continue from previous insertion point rather than starting at the beginning if that would help?:

    public void Enqueue(List<T> items) {
         items.Sort(_comparer);
         ... insertions ...
    }
ebickle commented 9 years ago

I've tossed a copy of the initial implementation here. Test coverage is far from complete, but if anyone is curious please take a look and let me know what you think. I've tried to follow the existing coding conventions from the System.Collections classes as much as possible.

justinvp commented 9 years ago

Cool. Some initial feedback:

justinvp commented 9 years ago

Also...

justinvp commented 9 years ago

Also...

ebickle commented 9 years ago

@justinvp Great feedback, thanks!

akoeplinger commented 9 years ago

@ebickle

Array.Empty is a F# feature, so can't use it here sadly!

Not anymore: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Array.cs#L1060-L1069

ebickle commented 9 years ago

I've migrated the code over to ebickle/corefx in the issue-574 branch.

Implemented the Array.Empty() change and plugged everything into the regular build pipeline. One slight temporary kludge I had to introduce was having the System.Collections project depend on the System.Collections nuget package, as Comparer isn't open source yet.

Will be fixed once issue dotnet/corefx#966 is complete.

One key area I'm looking for feedback is how ToArray, CopyTo, and the enumerator should be handled. Currently they're optimized for performance, which means the target array is a heap and not sorted by the comparer.

There are three options: 1) Leave it as-is, and document that the returned array is not sorted. (it's a priority queue, not a sorted queue) 2) Modify the methods to sort the returned items/arrays. They will no longer be O(n) operations. 3) Remove the methods and enumerable support altogether. This is the "purist" option but removes the ability to quickly grab the remaining queued items when the priority queue is no longer needed.

Another thing I'd like feedback on is whether the queue should be stable for two items with the same priority (compare result of 0). Generally priority queues make no guarantees that two items with the same priority will be dequeued in the order they were enqueued, but I noticed that internal priority queue implementation used in System.Reactive.Core took some additional overhead to ensure this property. My preference would be to not do this, but I'm not completely sure which option is better in terms developers' expectations.

mbeidler commented 9 years ago

I happened upon this PR because I too was interested in adding a priority queue to .NET. Glad to see someone went to the effort of making this proposal :). After reviewing the code, I noticed the following:

svick commented 9 years ago

I didn't see code to shrink the array in Dequeue. Shrinking the heap array by half when quarter full is typical.

Queue<T> and Stack<T> also don't shrink their arrays (based on Reference Source, they're not in CoreFX yet).

ebickle commented 9 years ago

@mbeidler I'd considered adding some form of automatic array shrinking in dequeue, but as @svick pointed out it doesn't exist in the reference implementations of similar data structures. I'd be curious to hear from anyone on the .NET Core/BCL team whether there's any particular reason they chose that style of implementation.

Update: I checked List, Queue, Queue, and ArrayList - none of them shrink the size of the internal array on a remove/dequeue.

Enqueue should support nulls, and is documented as allowing them. Did you run into a bug? I can't remember how robust the unit tests area in the area yet.

mbeidler commented 9 years ago

Interesting, I noticed in the reference source linked by @svick that Queue<T> has an unused private constant named _ShrinkThreshold. Perhaps that behavior existed in a previous version.

Concerning the use of IComparer instead of Equals in the implementation of Contains, I wrote up the following unit test, which would fail currently: https://gist.github.com/mbeidler/9e9f566ba7356302c57e

ebickle commented 9 years ago

@mbeidler Good point. According to MSDN, IComparer / IComparable only guarantees that a value of zero has the same sort order.

However, it looks as though the same issue exists in the other collection classes. If I modify the code to operate on SortedList<Patient, Patient>, the test case still fails on ContainsKey. The implementation of SortedList<TKey, TValue>.ContainsKey calls Array.BinarySearch, which relies on IComparer to check for equality. The same holds true for SortedSet.

Arguably it's a bug in the existing collection classes as well. I'll dug through the rest of the collections classes and see if there are any other data structures that accept an IComparer but test for equality separately. You're right though, for a priority queue you would expect custom ordering behavior that's completely independent from equality.

ebickle commented 9 years ago

Committed a fix and the test case into my fork branch. The new implementation of Contains is based directly on the behavior of List.Contains. Since List doesn't accept an IEqualityComparer, the behavior is functionally equivalent.

When I get some time later today, I'll likely submit bug reports for the other built-in collections. The probably can't be fixed due to regression behavior, but at very least the documentation needs to be updated.

mbeidler commented 9 years ago

I think it makes sense for ContainsKey to use the IComparer<TKey> implementation, since that is what specifies the key. However, I think it would be more logical for ContainsValue to use Equals instead of IComparable<TValue> in its linear search; though in this case the scope is significantly reduced since a type's natural ordering is much less likely to be inconsistent with equals.

It appears that in the MSDN documentation for SortedList<TKey, TValue>, the remarks section for ContainsValue does indicate that the TValue's sort ordering is used in lieu of equality.

ebickle commented 9 years ago

@terrajobst How do you feel about the API proposal so far? Do you feel this a good fit for CoreFX?

ddevault commented 9 years ago

:+1:

terrajobst commented 9 years ago

Thanks for filing this. I believe we've enough data to make a formal review of this proposal, hence I labelled it as 'ready for api review'

andrewgmorris commented 8 years ago

As Dequeue and Peek are throwing methods the caller needs to check count before each call. Would it make sense to instead (or in addition) provide TryDequeue and TryPeek following the pattern of the concurrent collections? There are issues open to add non-throwing methods to existing generic collections so adding a new collection which doesn't have these methods seems counterproductive.

benaadams commented 8 years ago

@andrewgmorris related https://github.com/dotnet/corefx/issues/4316 "Add TryDequeue to Queue"

weshaggard commented 8 years ago

We had a basic review of this and we agree that we want a ProrityQueue in the framework. However, we need to get someone to help drive the design and implementation for it. Whoever grabs the issue can work @terrajobst on finalizing the API.

shmuelie commented 8 years ago

So what work is left on the API?

justinvp commented 8 years ago

This is missing from the API proposal above: PriorityQueue<T> should implement IReadOnlyCollection<T> to match Queue<T> (Queue<T> now implements IReadOnlyCollection<T> as of .NET 4.6).

BrannonKing commented 8 years ago

I don't know that array-based priority queues are best. Memory allocation in .NET is really fast. We don't have the same search-small-blocks issue that the old malloc dealt with. You are welcome to use my priority queue code from here: https://github.com/BrannonKing/Kts.Astar/tree/master/Kts.AStar

SunnyWar commented 8 years ago

@ebickle One tiny nit on the speclet. It says:

/// Adds an object to the end of the <see cref="PriorityQueue{T}"/>. ... public void Enqueue(T item);

Shouldn't it say instead, /// Inserts object into the <see cref="PriorityQueue{T}"> by its priority.

ebickle commented 7 years ago

@SunnyWar Fixed the documentation of the Enqueue method, thank you!

bbarry commented 7 years ago

Some time ago, I created a data structure with complexities similar to a priority queue based on a Skip List data structure which I've decided at this point to share: https://gist.github.com/bbarry/5e0f3cc1ac7f7521fe6ea25947f48ace

https://en.wikipedia.org/wiki/Skip_list

The skip list matches the complexities of a priority queue above in average cases, except that Contains is an average case O(log(n)). In addition, access to either first or last elements are constant time operations and iteration in both forward and reverse order matches the forward order complexities of a PQ.

There obviously are downsides to such a structure in the form of higher costs on memory, and it does devolve to O(n) worst case inserts and removals, so it has its trade offs...

NKnusperer commented 7 years ago

Is this already implemented somewhere? When is the expected release? Also whats about updating the priority of an existing item?

danmoseley commented 7 years ago

@Priya91 @ianhays is this ready to get marked ready for review?

ianhays commented 7 years ago

This is missing from the API proposal above: PriorityQueue should implement IReadOnlyCollection to match Queue (Queue now implements IReadOnlyCollection as of .NET 4.6).

I agree with @justinvp here.

@Priya91 @ianhays is this ready to get marked ready for review?

I'd say so. This has been sitting for a while; lets make moves on it.

ebickle commented 7 years ago

@justinvp @ianhays I've updated the spec to implement IReadOnlyCollection. Thank you!

I have a full implementation of the class and it's associated PriorityQueueDebugView that uses an array-based implementation. Unit tests aren't at 100% coverage yet, but if there's some interest I can do a bit of work and dust off my fork.

@NKnusperer made a good point about updating the priority of an existing item. I'll leave it out for now but it's something to consider during spec review.

danmoseley commented 7 years ago

There are 2 implementations in full framework, that are possible alternatives. https://referencesource.microsoft.com/#q=priorityqueue

weshaggard commented 7 years ago

For reference here is a question about the Java PriorityQueue on stackoverflow http://stackoverflow.com/questions/683041/java-how-do-i-use-a-priorityqueue. It is interesting that the priority is handled by a comparer instead of just a simple int priority wrapper object. For example it doesn't make it easy to change the priority of an item in the queue already.

karelz commented 7 years ago

API review: We agree that it is useful to have this type in CoreFX, because we expect CoreFX to use it.

For final review of the API shape, we would like to see sample code: PriorityQueue<Thread> and PriorityQueue<MyClass>.

  1. How do we maintain the priority? Right now it is only implied by T.
  2. Do we want it to be possible to pass the priority when you add an entry? (which seems pretty handy)

Notes:

safern commented 7 years ago

This would be a really useful type to have in CoreFX. Is anyone interested in grabbing this one?

pgolebiowski commented 7 years ago

I don't like the idea of fixing the priority queue to a binary heap. Have a look at my AlgoKit wiki page for details. Quick thoughts:

Last year I implemented a few heap types. In particular, there is IHeap interface that could be used as the priority queue.

I'm all for introducing an IHeap interface and some most performant implementations (at least those array based). The API and implementations are in the repository I linked above.

So no priority queues. Heaps.

What do you think?

@karelz @safern @danmosemsft

karelz commented 7 years ago

@pgolebiowski Don't forget we are designing easy-to-use-while-pretty-performant API. If you want PriorityQueue (which is established computer-science term), you should be able to find one in docs / via internet search. If the underlying implementation is heap (which I personally wonder why), or something else, it doesn't matter too much. Assuming you can demonstrate the implementation is measurably better than simpler alternatives (complexity of code is also a metric that somewhat matters).

So if you still believe heap-based implementation (without IHeap API) is better than some simple list-based or list-of-array-chunks-based implementation, please explain why (ideally in a few sentences/paragraphs), so that we can get agreement on the implementation approach (and avoid wasting time on your side on complex implementation which may be turned down at PR review time).

benaadams commented 7 years ago

Drop ICollection, IEnumerable? Just have the generic versions (though generic IEnumerable<T> will bring in IEnumerable)

@pgolebiowski how its implemented doesn't change the external api. PriorityQueue defines the behaviour/contract; whereas a Heap is a specific implementation.

pgolebiowski commented 7 years ago

Priority queues vs heaps

If the underlying implementation is heap (which I personally wonder why), or something else, it doesn't matter too much. Assuming you can demonstrate the implementation is measurably better than simpler alternatives (complexity of code is also a metric that somewhat matters).

So if you still believe heap-based implementation is better than some simple list-based or list-of-array-chunks-based implementation, please explain why (ideally in a few sentences/paragraphs), so that we can get agreement on the implementation approach [...]

OK. A priority queue is an abstract data structure that can be then implemented in some way. Of course you can implement it with a data structure different than a heap. But no way is more efficient. As a result:

To back my words let's start with the theoretical support. Introduction to Algorithms, Cormen:

[…] priority queues come in two forms: max-priority queues and min-priority queues. We will focus here on how to implement max-priority queues, which are in turn based on max- heaps.

Stated clearly that priority queues are heaps. This is a shortcut of course, but you get the idea. Now, more importantly, let's have a look at how other languages and their standard libraries add support for the operations we are discussing:

I strongly believe we should go the Rust / Swift / Python / Go way and expose a heap explicitly while clearly stating in the docs that it can be used as a priority queue. I strongly believe this approach is very clean and simple.

I hope you all like the approach of just clearly exposing a heap data structure — and making a priority queue reference in the docs only.

API & implementation

We need to agree upon the matter above, but let me just start another topic — how to implement this. I can see two solutions here:

In the approach above, the ArrayHeap represents an implicit heap-ordered complete d-ary tree, stored as an array. This can be used to create for example a BinaryHeap or a QuaternaryHeap.

Bottom line

For further discussion I strongly encourage you to have a look at this paper. You will know that:

@karelz @benaadams

This would be a really useful type to have in CoreFX. Is anyone interested in grabbing this one?

@safern I'd be very happy to grab this one from now on.

karelz commented 7 years ago

OK, it was a problem between my keyboard and my chair - PriorityQueue is of course based on Heap -- I was thinking about just Queue where it doesn't make sense and forgot that heap is 'sorted' -- very embarrassing thinking-process-outage for someone like me, who loves logic, algorithms, Turing machines, etc., my apologies. (BTW: As soon as I read couple of sentences in your Java docs link, the discrepancy immediately clicked)

From that perspective it makes sense to build the API on top of Heap. We should not make that class public yet though - it will require its own API review and its own discussion if it is something we need in CoreFX. We do not want API surface creep due to implementation, but it may be the right thing to do -- hence the discussion needed. From that perspective I don't think we need to create IHeap just yet. It may be good decision later. If there is research that specific heap (e.g. 4-ary as you mention above) is best for general random input, then we should choose that. Let's wait for @safern @ianhays @stephentoub to confirm/voice opinion.

The parametrization of underlying heap with multiple implemented options is something which IMO does not belong into CoreFX (I may be wrong here, again - let's see what others think). My reason is that IMO we would just soon ship gazillions of specialized collections which would be very hard for people (average developer without strong background in nuances of algorithms) to choose from. Such library, however, would make a great NuGet package, for experts in the field - owned by you/community. In future, we may consider adding it into PowerCollections (we are actively discussing for last 4 months where on GitHub to put this library and if we should own it, or if we should encourage community to own it - there are different opinions on the matter, I expect we will finalize its fate post 2.0)

Assigning to you as you want to work on it ...

karelz commented 7 years ago

@pgolebiowski collaborator invite sent - when you accept ping me, I will be able to assign it to you then (GitHub limitations).

karelz commented 7 years ago

@benaadams I would keep ICollection (mild preference). For consistency with other d.s. in CoreFX. IMO it is not worth to have one odd beast here ... if we were adding a handful new ones (e.g. PowerCollections even to another repo), we should not include the non-generic ones ... thoughts?

pgolebiowski commented 7 years ago

OK, it was a problem between my keyboard and my chair.

Haha 😄 No worries.

it makes sense to build the API on top of Heap. We should not make that class public yet though [...] We do not want API surface creep due to implementation, but it may be the right thing to do -- hence the discussion needed. [...] I don't think we need to create IHeap just yet. It may be good decision later.

If the decision of the group is to go with PriorityQueue, I'll just help with the design and implement this. However, please take into consideration the fact that if we add a PriorityQueue now, it will be messy in the API later to add Heap -- as both behave basically the same. It would be sort of redundancy IMO. This would be a design smell for me. I wouldn't add the priority queue. It doesn't help.

Also, one more thought. Actually, the pairing heap data structure could come in handy fairly often. Array-based heaps are horrible at merging them. This operation is basically linear. When you're having lots of merging heaps, you're killing the performance. However, if you use a pairing heap instead of an array heap -- the merging operation is constant (amortized). This is another argument why I would like to provide a nice interface and two implementations. One for general input, second for some specific scenarios, especially when merging of heaps is involved.

Voting for IHeap + ArrayHeap + PairingHeap! 😄 (like in Rust / Swift / Python / Go)

If the pairing heap is too much -- OK. But let's at least go with IHeap + ArrayHeap. Don't you guys feel that going with a PriorityQueue class is locking the possibilities in the future and making the API less clear?

But as I said -- if you all vote for a PriorityQueue class over the proposed solution -- OK.

collaborator invite sent - when you accept ping me, I will be able to assign it to you then (GitHub limitations).

@karelz ping :)

karelz commented 7 years ago

please take into consideration the fact that if we add a PriorityQueue now, it will be messy in the API later to add Heap -- as both behave basically the same. It would be sort of redundancy IMO. This would be a design smell for me. I wouldn't add the priority queue. It doesn't help.

Can you explain more details about why it will be messy later? What are your concerns? PriorityQueue is concept people use. Having a type named that way is useful, right? I think the logical operations (at least their names) on Heap might be different. If they are the same, we can have 2 different implementations of the same code in the worst case (not ideal, but not the end of the world). Or we can insert Heap class as parent of PriorityQueue, right? (assuming it is allowed from API review perspective - right now I don't see reason for not, but I don't have that many years of experience with API reviews, so will wait on others to confirm)

Let's see how the voting & further design discussion go ... I am slowly warming up to the idea of IHeap + ArrayHeap, but not fully convinced yet ...

benaadams commented 7 years ago

if we were adding a handful new ones ... we should not include the non-generic ones

Red rag to a bull. Anyone got some other collections to add so we can drop ICollection?

Circular/Ring buffer; Generic and Concurrent?

shmuelie commented 7 years ago

@karelz A solution for the naming issue could be something like IPriorityQueue like DataFlow does for produce/consumer patterns. Many ways to implement a priority queue and if you don't care about that use the interface. Care about the implementation or are creating an instance use implementing class.