Open stephentoub opened 10 months ago
Tagging subscribers to this area: @dotnet/area-system-linq See info in area-owners.md if you want to be subscribed.
Author: | stephentoub |
---|---|
Assignees: | - |
Labels: | `area-System.Linq`, `untriaged` |
Milestone: | - |
At quick glance it seems we already have the infrastructure to accommodate such an optimization, essentially it would be us updating this method to use a PQ:
Do you still want contributions on this? I'd be happy to explore that.
After a quick look at it, I have the following thoughts: The priority queue does not support removing leafs, correct? Hence, for the implementation, I think it would be something along the following lines:
maxIdx
)queue.Count < maxIdx
: Simply Enqueuequeue.Count >= maxIdx
: Peek, compare and possibly DequeueEnqueue
minIdx
and maxIdx
Does that make sense? Or is there any way to avoid the need for reversing as it likely will require a second allocation of a maxIdx
sized chunk.
One question remains, though. Afaik OrderBy
guarantees stable sort but PriorityQueue
doesn't give any guarantees for values with the same priority. We would need to account for that separately right?
OrderBy is already ordered min to max, and PriorityQueue is a min heap; there shouldn't be any need for reversal. There's also no need to Peek; if Count == take limit, just EnqueueDequeue. And at the end of the enumeration, just TryDequeue until it returns false. For stability, yes, it needs to account for it; easiest way is to store the element's index as part of the priority and use it as a tie breaker if the elements otherwise compare equal.
I have a prototype locally, but if you'd like to try / take over, have at it :) The hardest part is the heuristic for when to use this approach.
As I would probably have some time the next few days, I'd like to have a go :)
I played around with a bit of code and understand now, that no peeking is necessary and keeping the sort stable is trivial using a combined prio.
However, I'm still not sure what I missed with regards to inverting the queue:
If you take the following random input [6, 3, 0, 6, 7, 1, 4, 9, 3, 2]
and use a priority queue with just EnqueueDequeue
when reaching the limit of 3
, you will get [6,7,9]
instead of [0,1,2]
.
EnqueueDequeue()
removes the smallest element, i.e. in this case 0
when enqueuing the second 6
and smaller values are immediately discarded. Accordingly, just looping for TryDequeue
returns ascending values, starting by the min in the heap.
The (isolated) code that I used:
static IEnumerable<T> TakeOrdered<T, TKey>(IEnumerable<T> source, Func<T, TKey> keySelector, int count)
{
PriorityQueue<T, (TKey key, int sourcePos)> queue = new(count);
int i = 0;
foreach (T item in source)
{
if (queue.Count == count)
{
queue.EnqueueDequeue(item, (keySelector(item), i));
}
else
{
queue.Enqueue(item, (keySelector(item), i));
}
i++;
}
while(queue.TryDequeue(out T? elem, out _))
{
yield return elem;
}
}
Reversing the comparison will result in [2, 1, 0] instead of [0, 1, 2]...
However, I'm still not sure what I missed with regards to inverting the queue:
You want to use a comparer that inverts the comparison (ideally by implementing a custom struct TPriority with a Compare that inverts the comparison); that then ensures that EnqueueDequeue will remove the largest elements such that the smallest remain in the queue. But then, you're right, you'll need to reverse the final order of elements retrieved with TryDequeue at the end, as they'll emerge largest to smallest rather than smallest to largest. The theory is that this optimization will be used when the Take(T) count is a relatively small number, and the reversal either can possibly be done with stack memory or won't even require additional memory because, e.g. it's part of a ToArray or ToList where the elements can either be stored into their destination slot directly or alternatively reversed in the destination storage.
Update from my side:
I wrote a first version in isolation (against .NET 8): | Method | NumberOfPeople | AmountTaken | AmountSkipped | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|
OrderByValueType | 512 | 20 | 0 | 2.697 us | 0.0162 us | 0.0136 us | 0.6294 | 0.0114 | 10528 B | |
OrderByValueType_New | 512 | 20 | 0 | 2.031 us | 0.0342 us | 0.0320 us | 0.0420 | - | 728 B | |
OrderByValueType | 512 | 20 | 128 | 4.866 us | 0.0167 us | 0.0130 us | 0.6256 | 0.0076 | 10560 B | |
OrderByValueType_New | 512 | 20 | 128 | 2.193 us | 0.0191 us | 0.0170 us | 0.2251 | - | 3800 B |
While a bit different than expected and with room for improvement, it looks very promising. So I went along and implemented it within the real system. However, the results are completely different there:
Before: | Method | NumberOfPeople | AmountTaken | AmountSkipped | Mean | Error | StdDev | Median | Min | Max | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
OrderByValueType | 512 | 20 | 0 | 3,169.08 ns | 18.857 ns | 17.639 ns | 3,168.06 ns | 3,145.39 ns | 3,207.83 ns | 0.6181 | 0.0126 | 10.28 KB | |
OrderByValueType | 512 | 20 | 128 | 3,965.31 ns | 76.728 ns | 78.794 ns | 3,959.77 ns | 3,872.22 ns | 4,131.64 ns | 0.6277 | - | 10.31 KB |
~2-3x slower with priority queue: | Method | NumberOfPeople | AmountTaken | AmountSkipped | Mean | Error | StdDev | Median | Min | Max | Gen0 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|---|
OrderByValueType | 512 | 20 | 0 | 6.230 us | 0.0480 us | 0.0425 us | 6.225 us | 6.187 us | 6.329 us | 0.0498 | 912 B | |
OrderByValueType | 512 | 20 | 128 | 12.693 us | 0.0361 us | 0.0320 us | 12.687 us | 12.650 us | 12.770 us | 0.2532 | 5040 B |
I shelved it and came back to it multiple times but have not yet understood what is going on. How do you usually profile such things? Are you using ETWProfiler? Any pointers before I go off into the wrong direction would be appriciated :)
Would it be possible to share your changes?
Would it be possible to share your changes?
Of course, I created a Draft PR for easy viewing: #97277
While heap sort has the (in a sense) optimal asymptotic complexity of O(n log n), the constant factor is fairly high. This is why many people tend to use sorting algorithms other than heap sort.
When n
is small, perhaps insertion sort might even beat PriorityQueue
. People usually pick a threshold between 5
and 20
, so for fewer elements people use insertion sort, rather than QuickSort. Please note that QuickSort usually has a lower constant factor than HeapSort, so I'd expect the threshold to be higher in our case.
In short: I'd use insertion sort when we know n < 50
(or so).
Do make sense while N is a large number and T is much smaller(T << N).
For time complexity, it expects log(T) << log(N)
which means a higher threshold of T << N
. As the PriorityQueue
depends on 4-ary heap, I guess the different $k
$ of $log_k(N)
$ between heap sort and quick sort is also a factor for consideration. It shall be a useful reference to the performance testing cases and threshold finalization.
For space complexity, it's really a brilliant idea. N->T
is the most valuable change from the perspective of my view.
The final result could be like,
n
, use insertion sort.n
n * rate
, use heap sortn * rate
, use quick sortThe side effect is that, make source codes much more complex 🤣.
Just a small clarification, the current implementation is actually O(N + T log T)
, not O(N log N)
because quickselect is O(n)
and allows only needing to sort T
elements.
Since O(N log (S+T)) >= O(N log T)
, O(N log T) >= O(N)
and O(N log T) >= O(T log T)
, we get O(N log (S+T)) >= O(N + T log T)
which means that the PQ is never asymptotically better for any T
or S
.
But as mentioned, constants do matter, so for very small T
and S
there could be optimization, but even in that case we might want to just scope this to insertion sort + lookup as @LeaFrock suggested (Max
/Min
is just a degenerate case of this) instead of a priority queue.
Order/OrderBy.Take is currently an O(N log N) operation that allocates an array of length N and keeps all the data around until the relevant portion of the data is consumed. With a PriorityQueue, however, for OrderBy(...).Take(T) this would instead be O(N log T), and it would only need to keep space proportional to T, not N, so if T is significantly smaller than N (which in typical use is the case, often with a small T value passed to Take), this could lead to significant throughput and memory consumption benefits.
Variations of this with OrderByDescending are also possible. It'd also be possible to accomodate OrderBy().Skip(S).Take(T); the priority queue would then need to maintain S+T values, so the benefits would diminish as S grew. However, from an algorithmic complexity perspective, S+T <= N, which means O(N log (S+T)) <= O(N log N). Of course, we'd still need to be careful, since the constants here do matter from a throughput perspective.