Closed dougmill closed 7 years ago
Awesome, great work!
It will take me a while to review this because I will need to see which changes actually make a difference in the JIT output. For example, I highly doubt nodeIndex / 2
to nodeIndex >> 1
actually changes anything.
How did you get a 25% improvement to Contains
? I can't imagine doing anything simpler, and it seems like you haven't changed it at all.
I was recently surprised in another project to learn that the return value can differ between someInt / 2
and someInt >> 1
- they round negative numbers differently. So, I don't know for certain that it will make a difference, but it seems plausible.
On contains, that improvement is only in the simple queue, I didn't see any possible improvement in the fast queue version.
For simple queue contains, my change drops the following operations:
When item is null, picking _nullNodesCache[0]
or null
and then comparing it to null
.
In other cases, fetching the list of nodes (rather than just checking one exists), checking its count, picking nodes[0]
or null
based on the count, and comparing the result to null
.
Skipping the count check is safe because any time a list is reduced to empty it is also removed from the dictionary, so there will never be an empty list in there. And apparently I neglected to apply that in GetExistingNode
, so the final line of that should be changed to just return nodes[0];
in addition to the changes already here.
Oh I see, I should have looked at the results closer. Yes I didn't spend a lot of time optimizing the SimpleQueue. I'll gladly merge these changes once I've vetted them. Please give me a week or two, I'm extremely busy recently. Thanks!
While you're at it, please add a version of fast queue with int
type priority - the project I'm using this for produces int
priority values naturally, and integer comparisons are faster than floating point. I tried it out without committing the change, and measured a low to mid single digit percent improvement, I don't remember exactly how much, in everything but Contains.
I was seeing slowdowns in my pathfinding benchmarks, so I've added tests to benchmark individual Priority Queue methods (pull from master to see them).
Enqueue
seems to be ~17% faster all around, but UpdatePriority
is ~6% slower on large datasets, and Dequeue
seems to be an order of magnitude slower (!?!). There is the possibility I screwed something up...
I would guess you screwed up in the CascadeDown rewrite somehow. That would not affect enqueue, would be partially balanced by CascadeUp improvement in update and also be relatively small because random nodes are likely near the bottom, and have a very large effect on dequeue with nothing to balance it and always a long path to swap through.
I was using your branch. I meant screwed up in the test.
I just tried to build your new benchmarks, and got two errors:
1>CSC : error CS0006: Metadata file '..\packages\Microsoft.CodeAnalysis.Analyzers.1.1.0\analyzers\dotnet\cs\Microsoft.CodeAnalysis.Analyzers.dll' could not be found
1>CSC : error CS0006: Metadata file '..\packages\Microsoft.CodeAnalysis.Analyzers.1.1.0\analyzers\dotnet\cs\Microsoft.CodeAnalysis.CSharp.Analyzers.dll' could not be found
Try uninstalling and reinstalling BenchmarkDotNet through NuGet
Got it running, and duplicated your result. I think it has something to do with the IterationSetup
bug you reported. I switched that method to be its own benchmark to give a baseline and then had all the others actually call it themselves, and dequeue became faster with my code.
Interestingly, the "just do the setup" benchmark is slower with my changes than without. I'm not entirely sure why. I thought it might be my change adding a little up front overhead that never gets compensated for because you're enqueueing in strict ascending priority order so it always bails out immediately, but reverting that didn't meaningfully help.
I might try applying one change at a time, as small as I can split them up, and running the benchmarks at each step, but honestly I'm really not that confident in these numbers in the first place. Something very strange is clearly going on because I don't see how my changes could possibly* affect the behavior of that bug, and it seems such a basic thing that they really shouldn't have missed it. So, this benchmarking tool spits out a lot of numbers and bits of analysis, but at this point I don't trust those numbers to be correct.
*On second thought, that bug might be sensitive to timing in some way. If so, it would explain how my changes can affect it, but that such a thing is even possible in this tool really does not help my confidence in its results.
Well that's annoying, I assumed that such a popular benchmarking library would actually work correctly. I'll try rewriting the benchmarks to combine enqueuing + dequeuing tomorrow.
Hmm, per the comments on your issue report, it's apparently a design issue with "Iteration" meaning something a bit higher level than one method invocation. I'll be doing some experimenting later.
I've tweaked your test cases a bit, because I think always doing things with Queue.First
and never an arbitrary node, and always using strictly in order priorities, probably skew the behavior patterns. I'm currently running with this:
namespace Priority_Queue_Benchmarks
{
public class Benchmarks
{
[Params(1, 100, 10000)]
public int QueueSize;
private FastPriorityQueue<FastPriorityQueueNode> Queue;
private FastPriorityQueueNode[] Nodes;
private int[] RandomPriorities;
[GlobalSetup]
public void GlobalSetup()
{
Queue = new FastPriorityQueue<FastPriorityQueueNode>(QueueSize * 2);
Nodes = new FastPriorityQueueNode[QueueSize];
Random rand = new Random();
RandomPriorities = new int[QueueSize * 2];
for (int i = 0; i < QueueSize * 2; i++)
{
RandomPriorities[i] = rand.Next(16777216); // constrain to range float can hold with no rounding
}
}
[Benchmark]
public void IterationSetup()
{
Queue.Clear();
for (int i = 0; i < QueueSize; i++)
{
Nodes[i] = new FastPriorityQueueNode();
Queue.Enqueue(Nodes[i], RandomPriorities[i + QueueSize]);
}
}
[Benchmark]
public void Dequeue()
{
IterationSetup();
for (int i = 0; i <= QueueSize / 2; i++)
{
Queue.Dequeue();
}
}
[Benchmark]
public void Enqueue()
{
IterationSetup();
for (int i = 0; i <= QueueSize / 2; i++)
{
Queue.Enqueue(new FastPriorityQueueNode(), RandomPriorities[i]);
}
}
[Benchmark]
public bool Contains()
{
IterationSetup();
bool ret = true;
for (int i = 0; i < QueueSize; i++)
{
ret &= Queue.Contains(Nodes[i]);
}
return ret;
}
[Benchmark]
public void Clear()
{
IterationSetup();
Queue.Clear();
}
[Benchmark]
public void UpdatePriority()
{
IterationSetup();
for (int i = 0; i < QueueSize; i++)
{
Queue.UpdatePriority(Nodes[i], RandomPriorities[i]);
}
}
}
}
It turns out you can workaround the bug I reported using RunStrategy = Monitoring
, but doing so causes the measurement-error to be too high, so I've stuck with measuring Enqueue
+ operations together.
In your code above, you should pass a seed to Random
, and move the allocations to [GlobalSetup]
so the tests aren't affected by allocator-jitter, but otherwise the solution I came up with is basically the same.
Running the new tests (which I've pushed), I see everything has sped up except for Enqueue, which is now slower. Are you able to reproduce this, or see any issue with the test?
I'm doing thorough change-by-change benchmarking now, it'll take a while to get through everything.
Took your benchmarks and added versions that use random priority numbers, plus a test for enumeration speed. You should probably add one for Remove too, but by the time I thought of that I really didn't want to go back to add it to the checks I'd already run.
I figured out what was slowing Enqueue down, it's fixed in Change 4 below. When CascadeUp is called with a node that's already where it should be, my refactor to avoid excess assignments in Swap ended up adding an operation of assigning the node to itself.
Benchmark results before any of my changes:
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 23.70ns | 0.0054ns | 0.0048ns |
EnqueueBackwards | 1 | 23.49ns | 0.0083ns | 0.0069ns |
EnqueueRandom | 1 | 24.57ns | 0.0395ns | 0.0350ns |
EnqueueDequeue | 1 | 23.82ns | 0.0041ns | 0.0038ns |
EnqueueBackwardsDequeue | 1 | 24.15ns | 0.0423ns | 0.0395ns |
EnqueueRandomDequeue | 1 | 23.98ns | 0.0183ns | 0.0163ns |
EnqueueUpdatePriority | 1 | 31.43ns | 0.0102ns | 0.0096ns |
EnqueueBackwardsUpdatePriority | 1 | 31.78ns | 0.0083ns | 0.0073ns |
EnqueueRandomUpdatePriority | 1 | 34.45ns | 0.0161ns | 0.0134ns |
EnqueueContains | 1 | 29.11ns | 0.0798ns | 0.0667ns |
Enqueue | 100 | 699.55ns | 0.0668ns | 0.0592ns |
EnqueueBackwards | 100 | 3,375.54ns | 16.4371ns | 14.5710ns |
EnqueueRandom | 100 | 1,596.11ns | 1.6911ns | 1.4122ns |
EnqueueDequeue | 100 | 4,412.84ns | 22.2034ns | 20.7691ns |
EnqueueBackwardsDequeue | 100 | 6,969.57ns | 10.8882ns | 9.6521ns |
EnqueueRandomDequeue | 100 | 5,277.03ns | 24.9260ns | 23.3158ns |
EnqueueUpdatePriority | 100 | 4,012.80ns | 9.5671ns | 8.9490ns |
EnqueueBackwardsUpdatePriority | 100 | 6,710.78ns | 15.7761ns | 14.7570ns |
EnqueueRandomUpdatePriority | 100 | 2,903.03ns | 8.9722ns | 8.3926ns |
EnqueueContains | 100 | 847.97ns | 1.6528ns | 1.5460ns |
Enqueue | 10000 | 68,439.80ns | 17.5800ns | 14.6801ns |
EnqueueBackwards | 10000 | 751,837.87ns | 294.1160ns | 275.1162ns |
EnqueueRandom | 10000 | 261,665.52ns | 109.0163ns | 96.6401ns |
EnqueueDequeue | 10000 | 1,063,948.43ns | 208.7239ns | 195.2404ns |
EnqueueBackwardsDequeue | 10000 | 1,665,493.35ns | 1,482.7118ns | 1,314.3852ns |
EnqueueRandomDequeue | 10000 | 1,582,293.56ns | 213.6763ns | 189.4185ns |
EnqueueUpdatePriority | 10000 | 930,500.69ns | 318.6172ns | 282.4459ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,590,628.87ns | 504.8412ns | 421.5652ns |
EnqueueRandomUpdatePriority | 10000 | 567,553.84ns | 81.6522ns | 63.7486ns |
EnqueueContains | 10000 | 84,466.75ns | 19.4158ns | 16.2131ns |
Change 1: Replace an array index lookup with an existing local variable in Enqueue
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 23.06ns | 0.0049ns | 0.0045ns |
EnqueueBackwards | 1 | 23.06ns | 0.0058ns | 0.0051ns |
EnqueueRandom | 1 | 23.46ns | 0.0840ns | 0.0785ns |
EnqueueDequeue | 1 | 23.97ns | 0.0552ns | 0.0516ns |
EnqueueBackwardsDequeue | 1 | 23.30ns | 0.0035ns | 0.0031ns |
EnqueueRandomDequeue | 1 | 23.58ns | 0.0054ns | 0.0050ns |
EnqueueUpdatePriority | 1 | 31.10ns | 0.0327ns | 0.0306ns |
EnqueueBackwardsUpdatePriority | 1 | 31.34ns | 0.0477ns | 0.0446ns |
EnqueueRandomUpdatePriority | 1 | 31.47ns | 0.0270ns | 0.0239ns |
EnqueueContains | 1 | 28.49ns | 0.0449ns | 0.0420ns |
Enqueue | 100 | 628.57ns | 0.5571ns | 0.5211ns |
EnqueueBackwards | 100 | 3,340.79ns | 20.0611ns | 17.7837ns |
EnqueueRandom | 100 | 1,479.08ns | 7.2961ns | 6.8247ns |
EnqueueDequeue | 100 | 4,377.86ns | 7.9220ns | 7.0226ns |
EnqueueBackwardsDequeue | 100 | 7,086.52ns | 40.7370ns | 38.1054ns |
EnqueueRandomDequeue | 100 | 4,844.13ns | 19.5057ns | 18.2456ns |
EnqueueUpdatePriority | 100 | 3,933.72ns | 5.0809ns | 4.7527ns |
EnqueueBackwardsUpdatePriority | 100 | 6,640.83ns | 12.3533ns | 11.5553ns |
EnqueueRandomUpdatePriority | 100 | 2,913.08ns | 18.9302ns | 17.7073ns |
EnqueueContains | 100 | 773.98ns | 0.0968ns | 0.0858ns |
Enqueue | 10000 | 61,133.03ns | 6.2108ns | 5.8096ns |
EnqueueBackwards | 10000 | 737,613.59ns | 129.3863ns | 101.0163ns |
EnqueueRandom | 10000 | 240,571.62ns | 72.9698ns | 64.6858ns |
EnqueueDequeue | 10000 | 1,063,322.46ns | 480.8526ns | 449.7898ns |
EnqueueBackwardsDequeue | 10000 | 1,662,042.66ns | 1,086.9245ns | 1,016.7098ns |
EnqueueRandomDequeue | 10000 | 1,559,694.85ns | 386.9005ns | 361.9069ns |
EnqueueUpdatePriority | 10000 | 933,441.43ns | 218.7358ns | 204.6056ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,594,834.15ns | 8,406.7363ns | 7,863.6656ns |
EnqueueRandomUpdatePriority | 10000 | 547,323.52ns | 488.0810ns | 456.5513ns |
EnqueueContains | 10000 | 77,311.92ns | 35.1052ns | 27.4078ns |
All Enqueue tests improved slightly, as expected. Oddly, backwards got double the benefit and random got triple, and only random showed it in the Enqueue+Dequeue benchmark. I'm guessing random variation for that.
Change 2: Replace division with bit shift in CascadeUp
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 23.04ns | 0.1229ns | 0.1150ns |
EnqueueBackwards | 1 | 22.94ns | 0.0045ns | 0.0040ns |
EnqueueRandom | 1 | 23.55ns | 0.1156ns | 0.1081ns |
EnqueueDequeue | 1 | 23.20ns | 0.0066ns | 0.0058ns |
EnqueueBackwardsDequeue | 1 | 24.81ns | 0.0041ns | 0.0039ns |
EnqueueRandomDequeue | 1 | 23.43ns | 0.1091ns | 0.1021ns |
EnqueueUpdatePriority | 1 | 30.91ns | 0.0419ns | 0.0350ns |
EnqueueBackwardsUpdatePriority | 1 | 30.90ns | 0.0314ns | 0.0294ns |
EnqueueRandomUpdatePriority | 1 | 31.49ns | 0.0118ns | 0.0105ns |
EnqueueContains | 1 | 28.50ns | 0.0031ns | 0.0027ns |
Enqueue | 100 | 613.19ns | 0.0935ns | 0.0829ns |
EnqueueBackwards | 100 | 3,305.31ns | 1.1133ns | 1.0413ns |
EnqueueRandom | 100 | 1,427.34ns | 6.2933ns | 5.8868ns |
EnqueueDequeue | 100 | 4,327.81ns | 32.8183ns | 30.6982ns |
EnqueueBackwardsDequeue | 100 | 6,992.03ns | 10.8474ns | 9.0581ns |
EnqueueRandomDequeue | 100 | 4,826.75ns | 10.3922ns | 9.2124ns |
EnqueueUpdatePriority | 100 | 3,900.85ns | 5.4591ns | 5.1065ns |
EnqueueBackwardsUpdatePriority | 100 | 6,609.58ns | 1.9522ns | 1.4116ns |
EnqueueRandomUpdatePriority | 100 | 2,895.60ns | 11.6646ns | 10.9111ns |
EnqueueContains | 100 | 764.71ns | 3.5469ns | 3.3178ns |
Enqueue | 10000 | 60,159.54ns | 9.4723ns | 7.9098ns |
EnqueueBackwards | 10000 | 732,523.89ns | 224.7203ns | 199.2086ns |
EnqueueRandom | 10000 | 235,189.39ns | 59.9786ns | 53.1695ns |
EnqueueDequeue | 10000 | 1,058,321.45ns | 198.0816ns | 185.2857ns |
EnqueueBackwardsDequeue | 10000 | 1,659,102.72ns | 1,704.4901ns | 1,423.3263ns |
EnqueueRandomDequeue | 10000 | 1,555,021.86ns | 397.7956ns | 372.0982ns |
EnqueueUpdatePriority | 10000 | 921,594.29ns | 5,164.6670ns | 4,831.0323ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,562,259.69ns | 283.4421ns | 251.2640ns |
EnqueueRandomUpdatePriority | 10000 | 540,237.70ns | 245.8919ns | 217.9768ns |
EnqueueContains | 10000 | 75,761.55ns | 15.1815ns | 13.4580ns |
Enqueue barely budges, backwards and random improve slightly more. Dequeue shows no significant change, update also improves slightly. Pretty much what I expected.
Change 3: Factor out the Swap call in CascadeUp
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 25.15ns | 0.0086ns | 0.0067ns |
EnqueueBackwards | 1 | 25.99ns | 0.0110ns | 0.0103ns |
EnqueueRandom | 1 | 25.53ns | 0.0093ns | 0.0082ns |
EnqueueDequeue | 1 | 25.15ns | 0.0901ns | 0.0843ns |
EnqueueBackwardsDequeue | 1 | 25.34ns | 0.0588ns | 0.0550ns |
EnqueueRandomDequeue | 1 | 25.53ns | 0.0082ns | 0.0073ns |
EnqueueUpdatePriority | 1 | 32.83ns | 0.0267ns | 0.0236ns |
EnqueueBackwardsUpdatePriority | 1 | 32.51ns | 0.0140ns | 0.0124ns |
EnqueueRandomUpdatePriority | 1 | 33.65ns | 0.0509ns | 0.0451ns |
EnqueueContains | 1 | 29.75ns | 0.0173ns | 0.0145ns |
Enqueue | 100 | 842.26ns | 0.2187ns | 0.2046ns |
EnqueueBackwards | 100 | 2,306.17ns | 8.5186ns | 7.9683ns |
EnqueueRandom | 100 | 1,356.36ns | 19.5113ns | 18.2508ns |
EnqueueDequeue | 100 | 4,590.50ns | 9.8110ns | 9.1772ns |
EnqueueBackwardsDequeue | 100 | 5,942.21ns | 27.5989ns | 25.8160ns |
EnqueueRandomDequeue | 100 | 4,721.65ns | 7.6805ns | 7.1843ns |
EnqueueUpdatePriority | 100 | 3,612.41ns | 11.1550ns | 10.4344ns |
EnqueueBackwardsUpdatePriority | 100 | 5,095.31ns | 10.1868ns | 9.5287ns |
EnqueueRandomUpdatePriority | 100 | 2,598.25ns | 10.4238ns | 9.7504ns |
EnqueueContains | 100 | 1,009.89ns | 0.2935ns | 0.2602ns |
Enqueue | 10000 | 82,203.35ns | 13.2552ns | 11.0687ns |
EnqueueBackwards | 10000 | 434,404.58ns | 54.5780ns | 51.0523ns |
EnqueueRandom | 10000 | 225,285.90ns | 355.0185ns | 296.4565ns |
EnqueueDequeue | 10000 | 1,098,426.81ns | 1,193.9076ns | 932.1244ns |
EnqueueBackwardsDequeue | 10000 | 1,345,915.35ns | 440.7042ns | 344.0728ns |
EnqueueRandomDequeue | 10000 | 1,542,973.77ns | 341.4763ns | 319.4172ns |
EnqueueUpdatePriority | 10000 | 790,467.03ns | 93.0147ns | 87.0060ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,107,735.43ns | 599.7814ns | 561.0359ns |
EnqueueRandomUpdatePriority | 10000 | 514,600.82ns | 385.4129ns | 321.8372ns |
EnqueueContains | 10000 | 100,519.61ns | 21.3884ns | 17.8602ns |
EnqueueRandom shows slight improvement, EnqueueBackwards makes a giant leap better, but Enqueue mysteriously gets slower. All Dequeue and Update variants are similarly affected, showing no effect or further improvements on top of the included Enqueue.
Change 4: I figured out what caused the decrease in Enqueue performance - in the no-op case of CascadeUp, which happens on every call of the base Enqueue benchmark, I had added an additional operation of assigning a variable to itself. To fix this, I "unrolled" the first iteration of the loop - if it reaches the break point on the very first node, it returns instead of break.
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 25.19ns | 0.1013ns | 0.0947ns |
EnqueueBackwards | 1 | 25.29ns | 0.0054ns | 0.0045ns |
EnqueueRandom | 1 | 25.73ns | 0.0853ns | 0.0712ns |
EnqueueDequeue | 1 | 25.30ns | 0.1186ns | 0.1110ns |
EnqueueBackwardsDequeue | 1 | 26.44ns | 0.0124ns | 0.0116ns |
EnqueueRandomDequeue | 1 | 25.76ns | 0.0168ns | 0.0158ns |
EnqueueUpdatePriority | 1 | 32.67ns | 0.0303ns | 0.0283ns |
EnqueueBackwardsUpdatePriority | 1 | 34.29ns | 0.1197ns | 0.1120ns |
EnqueueRandomUpdatePriority | 1 | 35.03ns | 0.0103ns | 0.0086ns |
EnqueueContains | 1 | 29.69ns | 0.0115ns | 0.0096ns |
Enqueue | 100 | 649.55ns | 0.2811ns | 0.2630ns |
EnqueueBackwards | 100 | 2,359.95ns | 1.5029ns | 1.4058ns |
EnqueueRandom | 100 | 1,282.79ns | 0.6518ns | 0.5089ns |
EnqueueDequeue | 100 | 4,050.00ns | 9.3506ns | 8.7466ns |
EnqueueBackwardsDequeue | 100 | 5,602.77ns | 19.3451ns | 18.0954ns |
EnqueueRandomDequeue | 100 | 4,975.84ns | 22.6907ns | 21.2249ns |
EnqueueUpdatePriority | 100 | 3,346.71ns | 1.7097ns | 1.5156ns |
EnqueueBackwardsUpdatePriority | 100 | 5,058.36ns | 7.1863ns | 6.7221ns |
EnqueueRandomUpdatePriority | 100 | 2,403.64ns | 1.3917ns | 1.1621ns |
EnqueueContains | 100 | 798.10ns | 0.2786ns | 0.2606ns |
Enqueue | 10000 | 63,449.59ns | 7.1874ns | 6.0018ns |
EnqueueBackwards | 10000 | 454,265.13ns | 804.9130ns | 752.9160ns |
EnqueueRandom | 10000 | 218,237.78ns | 1,292.7209ns | 1,209.2118ns |
EnqueueDequeue | 10000 | 1,029,155.22ns | 179.4745ns | 159.0994ns |
EnqueueBackwardsDequeue | 10000 | 1,303,476.08ns | 244.0623ns | 203.8030ns |
EnqueueRandomDequeue | 10000 | 1,554,887.79ns | 232.7699ns | 217.7330ns |
EnqueueUpdatePriority | 10000 | 770,142.02ns | 218.9391ns | 182.8241ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,121,204.62ns | 446.0621ns | 417.2467ns |
EnqueueRandomUpdatePriority | 10000 | 496,390.39ns | 2,082.4300ns | 1,846.0197ns |
EnqueueContains | 10000 | 79,149.01ns | 8.5873ns | 8.0326ns |
As expected, this drops Enqueue back down, though there's still about a 3 microsecond difference I can't explain. My guess is something related to code size, as far as I can tell the sequence of operations for the no-op case is now identical to before.
Change 5: Moved the first bit shift and added an early return, to optimize the case of calling CascadeUp on the head node. I later realized this only ever happens when enqueueing to an empty queue, but I don't think it hurt anything and should have a (tiny) improvement.
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 24.14ns | 0.0048ns | 0.0045ns |
EnqueueBackwards | 1 | 23.50ns | 0.0055ns | 0.0049ns |
EnqueueRandom | 1 | 24.63ns | 0.0145ns | 0.0113ns |
EnqueueDequeue | 1 | 23.47ns | 0.0092ns | 0.0072ns |
EnqueueBackwardsDequeue | 1 | 23.77ns | 0.0041ns | 0.0039ns |
EnqueueRandomDequeue | 1 | 23.57ns | 0.0036ns | 0.0030ns |
EnqueueUpdatePriority | 1 | 31.76ns | 0.0193ns | 0.0171ns |
EnqueueBackwardsUpdatePriority | 1 | 32.21ns | 0.0198ns | 0.0165ns |
EnqueueRandomUpdatePriority | 1 | 33.24ns | 0.0063ns | 0.0049ns |
EnqueueContains | 1 | 28.03ns | 0.0207ns | 0.0173ns |
Enqueue | 100 | 651.85ns | 0.3444ns | 0.3222ns |
EnqueueBackwards | 100 | 2,358.57ns | 9.8739ns | 9.2361ns |
EnqueueRandom | 100 | 1,315.91ns | 0.6649ns | 0.5894ns |
EnqueueDequeue | 100 | 4,401.37ns | 19.6102ns | 17.3839ns |
EnqueueBackwardsDequeue | 100 | 5,826.22ns | 4.1035ns | 3.6377ns |
EnqueueRandomDequeue | 100 | 4,829.20ns | 66.2628ns | 61.9822ns |
EnqueueUpdatePriority | 100 | 3,452.89ns | 8.9250ns | 8.3485ns |
EnqueueBackwardsUpdatePriority | 100 | 5,142.04ns | 8.7607ns | 8.1948ns |
EnqueueRandomUpdatePriority | 100 | 2,565.42ns | 2.3777ns | 2.2241ns |
EnqueueContains | 100 | 801.17ns | 0.1006ns | 0.0941ns |
Enqueue | 10000 | 63,719.13ns | 10.0423ns | 9.3936ns |
EnqueueBackwards | 10000 | 440,185.20ns | 57.2720ns | 47.8247ns |
EnqueueRandom | 10000 | 216,733.89ns | 201.5636ns | 188.5427ns |
EnqueueDequeue | 10000 | 1,077,364.05ns | 568.6074ns | 443.9312ns |
EnqueueBackwardsDequeue | 10000 | 1,357,187.92ns | 268.4208ns | 251.0810ns |
EnqueueRandomDequeue | 10000 | 1,537,493.54ns | 463.7648ns | 433.8058ns |
EnqueueUpdatePriority | 10000 | 771,984.83ns | 71.5662ns | 66.9430ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,097,932.13ns | 834.3672ns | 739.6447ns |
EnqueueRandomUpdatePriority | 10000 | 508,312.52ns | 99.5266ns | 88.2277ns |
EnqueueContains | 10000 | 79,903.73ns | 24.1708ns | 22.6094ns |
Some benchmarks are worse, some better, with a weak trend to worse. I tested reverting this specific change later and saw no improvement for the revert, so I'm inclined to chalk this up to randomness.
Change 6: Moved the bit shift in CascadeUp so that it will be skipped on the final iteration, and also made it stop when the swap would be of equal priority nodes.
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 23.83ns | 0.0038ns | 0.0034ns |
EnqueueBackwards | 1 | 24.26ns | 0.0042ns | 0.0038ns |
EnqueueRandom | 1 | 24.04ns | 0.0073ns | 0.0057ns |
EnqueueDequeue | 1 | 23.65ns | 0.0051ns | 0.0047ns |
EnqueueBackwardsDequeue | 1 | 23.82ns | 0.0125ns | 0.0104ns |
EnqueueRandomDequeue | 1 | 23.64ns | 0.0042ns | 0.0038ns |
EnqueueUpdatePriority | 1 | 32.02ns | 0.0516ns | 0.0482ns |
EnqueueBackwardsUpdatePriority | 1 | 31.42ns | 0.0789ns | 0.0738ns |
EnqueueRandomUpdatePriority | 1 | 32.03ns | 0.0081ns | 0.0075ns |
EnqueueContains | 1 | 28.32ns | 0.0129ns | 0.0114ns |
Enqueue | 100 | 651.09ns | 0.1539ns | 0.1285ns |
EnqueueBackwards | 100 | 2,474.32ns | 1.8081ns | 1.6913ns |
EnqueueRandom | 100 | 1,326.77ns | 1.4647ns | 1.3700ns |
EnqueueDequeue | 100 | 4,422.01ns | 12.5052ns | 11.0855ns |
EnqueueBackwardsDequeue | 100 | 5,979.95ns | 7.1830ns | 6.3676ns |
EnqueueRandomDequeue | 100 | 4,811.23ns | 26.2295ns | 24.5351ns |
EnqueueUpdatePriority | 100 | 3,460.94ns | 17.5167ns | 16.3851ns |
EnqueueBackwardsUpdatePriority | 100 | 5,294.51ns | 8.2645ns | 6.9012ns |
EnqueueRandomUpdatePriority | 100 | 2,689.72ns | 5.5826ns | 4.0366ns |
EnqueueContains | 100 | 799.59ns | 0.0954ns | 0.0845ns |
Enqueue | 10000 | 64,555.38ns | 15.4673ns | 12.9159ns |
EnqueueBackwards | 10000 | 466,651.35ns | 131.1901ns | 122.7153ns |
EnqueueRandom | 10000 | 217,306.92ns | 548.7084ns | 513.2621ns |
EnqueueDequeue | 10000 | 1,072,899.93ns | 101.6111ns | 90.0756ns |
EnqueueBackwardsDequeue | 10000 | 1,384,956.41ns | 168.1195ns | 157.2591ns |
EnqueueRandomDequeue | 10000 | 1,534,886.29ns | 359.0919ns | 318.3256ns |
EnqueueUpdatePriority | 10000 | 781,804.03ns | 93.7744ns | 73.2128ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,144,182.14ns | 432.8527ns | 337.9429ns |
EnqueueRandomUpdatePriority | 10000 | 518,266.60ns | 73.4251ns | 65.0895ns |
EnqueueContains | 10000 | 79,828.49ns | 13.9809ns | 11.6747ns |
I was expecting a small but noticeable improvement for this one, but it seems I got mostly worse times instead. Again, I tried reverting this later and saw no improvement, so I'm guessing randomness.
Change 7: Check the node index in CascadeUp before the first assignment to a local variable.
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 23.52ns | 0.0040ns | 0.0036ns |
EnqueueBackwards | 1 | 23.82ns | 0.0114ns | 0.0107ns |
EnqueueRandom | 1 | 24.43ns | 0.0062ns | 0.0058ns |
EnqueueDequeue | 1 | 23.49ns | 0.0057ns | 0.0053ns |
EnqueueBackwardsDequeue | 1 | 23.87ns | 0.0074ns | 0.0066ns |
EnqueueRandomDequeue | 1 | 23.57ns | 0.0051ns | 0.0045ns |
EnqueueUpdatePriority | 1 | 31.55ns | 0.0483ns | 0.0377ns |
EnqueueBackwardsUpdatePriority | 1 | 31.41ns | 0.0066ns | 0.0062ns |
EnqueueRandomUpdatePriority | 1 | 31.88ns | 0.1081ns | 0.1012ns |
EnqueueContains | 1 | 28.03ns | 0.0044ns | 0.0041ns |
Enqueue | 100 | 650.95ns | 0.1262ns | 0.1181ns |
EnqueueBackwards | 100 | 2,467.41ns | 1.1693ns | 1.0937ns |
EnqueueRandom | 100 | 1,312.95ns | 1.9043ns | 1.3770ns |
EnqueueDequeue | 100 | 4,379.01ns | 15.5859ns | 14.5790ns |
EnqueueBackwardsDequeue | 100 | 5,937.19ns | 20.3389ns | 19.0250ns |
EnqueueRandomDequeue | 100 | 4,853.26ns | 21.4105ns | 18.9798ns |
EnqueueUpdatePriority | 100 | 3,452.98ns | 5.5251ns | 5.1682ns |
EnqueueBackwardsUpdatePriority | 100 | 5,265.21ns | 3.1482ns | 2.7908ns |
EnqueueRandomUpdatePriority | 100 | 2,706.22ns | 16.4183ns | 15.3577ns |
EnqueueContains | 100 | 798.76ns | 0.1572ns | 0.1394ns |
Enqueue | 10000 | 63,815.43ns | 12.9826ns | 12.1439ns |
EnqueueBackwards | 10000 | 467,740.56ns | 176.5814ns | 137.8631ns |
EnqueueRandom | 10000 | 216,327.37ns | 212.2575ns | 198.5457ns |
EnqueueDequeue | 10000 | 1,072,142.36ns | 219.3942ns | 194.4872ns |
EnqueueBackwardsDequeue | 10000 | 1,383,692.69ns | 298.5832ns | 264.6862ns |
EnqueueRandomDequeue | 10000 | 1,540,627.22ns | 298.6285ns | 279.3373ns |
EnqueueUpdatePriority | 10000 | 781,679.48ns | 74.5785ns | 66.1119ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,136,366.00ns | 365.4693ns | 323.9790ns |
EnqueueRandomUpdatePriority | 10000 | 520,205.79ns | 236.1968ns | 197.2350ns |
EnqueueContains | 10000 | 79,563.30ns | 13.2946ns | 11.7853ns |
No meaningful change in anything. I think it should theoretically be better, or at least not worse, so I'm keeping this change.
Change 8: Add the AggressiveInlining attribute to CascadeUp.
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 22.24ns | 0.0077ns | 0.0068ns |
EnqueueBackwards | 1 | 22.02ns | 0.0045ns | 0.0038ns |
EnqueueRandom | 1 | 22.31ns | 0.0043ns | 0.0034ns |
EnqueueDequeue | 1 | 22.69ns | 0.0033ns | 0.0030ns |
EnqueueBackwardsDequeue | 1 | 22.16ns | 0.0050ns | 0.0042ns |
EnqueueRandomDequeue | 1 | 23.41ns | 0.0981ns | 0.0917ns |
EnqueueUpdatePriority | 1 | 30.24ns | 0.0487ns | 0.0432ns |
EnqueueBackwardsUpdatePriority | 1 | 30.14ns | 0.0109ns | 0.0091ns |
EnqueueRandomUpdatePriority | 1 | 30.33ns | 0.0329ns | 0.0275ns |
EnqueueContains | 1 | 28.48ns | 0.1631ns | 0.1526ns |
Enqueue | 100 | 453.02ns | 0.0961ns | 0.0852ns |
EnqueueBackwards | 100 | 2,122.68ns | 0.4891ns | 0.4336ns |
EnqueueRandom | 100 | 1,064.51ns | 1.5952ns | 1.2454ns |
EnqueueDequeue | 100 | 4,041.42ns | 10.6785ns | 9.9887ns |
EnqueueBackwardsDequeue | 100 | 5,446.79ns | 12.1101ns | 11.3278ns |
EnqueueRandomDequeue | 100 | 4,599.58ns | 9.2748ns | 8.6757ns |
EnqueueUpdatePriority | 100 | 3,074.87ns | 9.6215ns | 8.9999ns |
EnqueueBackwardsUpdatePriority | 100 | 4,744.44ns | 15.7375ns | 14.7209ns |
EnqueueRandomUpdatePriority | 100 | 2,198.05ns | 3.5201ns | 2.7483ns |
EnqueueContains | 100 | 621.58ns | 0.1698ns | 0.1505ns |
Enqueue | 10000 | 43,682.09ns | 4.1872ns | 3.7118ns |
EnqueueBackwards | 10000 | 416,682.10ns | 99.4232ns | 93.0005ns |
EnqueueRandom | 10000 | 190,063.52ns | 80.5496ns | 71.4051ns |
EnqueueDequeue | 10000 | 1,032,158.26ns | 369.6891ns | 244.5265ns |
EnqueueBackwardsDequeue | 10000 | 1,322,934.02ns | 1,218.6786ns | 1,017.6517ns |
EnqueueRandomDequeue | 10000 | 1,531,388.11ns | 1,827.9342ns | 1,709.8506ns |
EnqueueUpdatePriority | 10000 | 740,885.87ns | 217.5127ns | 203.4615ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,067,612.13ns | 139.7863ns | 116.7279ns |
EnqueueRandomUpdatePriority | 10000 | 466,245.07ns | 65.0019ns | 60.8028ns |
EnqueueContains | 10000 | 61,976.79ns | 43.2958ns | 40.4989ns |
All Enqueue times substantially improved, and other benchmarks added their own improvements on top. Whatever caused inlining here to decrease performance before is evidently gone now.
Change 9: Tried caching the node index in a local variable in CascadeUp.
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 22.74ns | 0.0202ns | 0.0169ns |
EnqueueBackwards | 1 | 22.95ns | 0.0275ns | 0.0258ns |
EnqueueRandom | 1 | 23.27ns | 0.1024ns | 0.0908ns |
EnqueueDequeue | 1 | 22.98ns | 0.0158ns | 0.0147ns |
EnqueueBackwardsDequeue | 1 | 22.80ns | 0.0021ns | 0.0015ns |
EnqueueRandomDequeue | 1 | 23.37ns | 0.0728ns | 0.0681ns |
EnqueueUpdatePriority | 1 | 30.01ns | 0.0083ns | 0.0078ns |
EnqueueBackwardsUpdatePriority | 1 | 30.37ns | 0.0251ns | 0.0235ns |
EnqueueRandomUpdatePriority | 1 | 30.88ns | 0.0328ns | 0.0274ns |
EnqueueContains | 1 | 27.29ns | 0.0216ns | 0.0180ns |
Enqueue | 100 | 474.31ns | 1.1535ns | 1.0790ns |
EnqueueBackwards | 100 | 2,084.20ns | 1.9779ns | 1.8501ns |
EnqueueRandom | 100 | 1,051.70ns | 1.7402ns | 1.6278ns |
EnqueueDequeue | 100 | 4,015.11ns | 6.2019ns | 5.4979ns |
EnqueueBackwardsDequeue | 100 | 5,469.15ns | 17.9352ns | 15.8991ns |
EnqueueRandomDequeue | 100 | 4,622.77ns | 26.4007ns | 23.4035ns |
EnqueueUpdatePriority | 100 | 3,057.60ns | 3.2832ns | 2.5633ns |
EnqueueBackwardsUpdatePriority | 100 | 4,710.85ns | 9.3391ns | 8.2789ns |
EnqueueRandomUpdatePriority | 100 | 2,248.42ns | 3.9133ns | 3.2678ns |
EnqueueContains | 100 | 644.04ns | 2.0702ns | 1.8351ns |
Enqueue | 10000 | 45,596.08ns | 77.7257ns | 72.7046ns |
EnqueueBackwards | 10000 | 412,514.14ns | 134.1962ns | 112.0599ns |
EnqueueRandom | 10000 | 189,944.90ns | 54.4808ns | 45.4939ns |
EnqueueDequeue | 10000 | 1,038,879.45ns | 113.5238ns | 88.6319ns |
EnqueueBackwardsDequeue | 10000 | 1,316,721.29ns | 1,886.6193ns | 1,764.7447ns |
EnqueueRandomDequeue | 10000 | 1,537,147.12ns | 1,202.0277ns | 1,003.7475ns |
EnqueueUpdatePriority | 10000 | 735,055.07ns | 87.5767ns | 77.6345ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,066,156.29ns | 574.5495ns | 537.4339ns |
EnqueueRandomUpdatePriority | 10000 | 470,931.10ns | 171.0114ns | 151.5972ns |
EnqueueContains | 10000 | 63,646.34ns | 62.7247ns | 58.6727ns |
No definite effect, and I think it slightly complicates the code. I reverted this change immediately and never tried local variable caching again.
Change 10: Copied in the rewrite of CascadeDown
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 22.20ns | 0.0071ns | 0.0066ns |
EnqueueBackwards | 1 | 22.29ns | 0.0057ns | 0.0053ns |
EnqueueRandom | 1 | 22.34ns | 0.0083ns | 0.0070ns |
EnqueueDequeue | 1 | 22.56ns | 0.0027ns | 0.0025ns |
EnqueueBackwardsDequeue | 1 | 22.09ns | 0.0048ns | 0.0045ns |
EnqueueRandomDequeue | 1 | 22.58ns | 0.0058ns | 0.0054ns |
EnqueueUpdatePriority | 1 | 29.73ns | 0.0357ns | 0.0334ns |
EnqueueBackwardsUpdatePriority | 1 | 29.56ns | 0.0457ns | 0.0427ns |
EnqueueRandomUpdatePriority | 1 | 30.26ns | 0.0043ns | 0.0036ns |
EnqueueContains | 1 | 26.93ns | 0.0066ns | 0.0058ns |
Enqueue | 100 | 453.04ns | 0.1152ns | 0.1077ns |
EnqueueBackwards | 100 | 2,138.29ns | 2.7901ns | 2.0174ns |
EnqueueRandom | 100 | 1,067.74ns | 1.3937ns | 1.2355ns |
EnqueueDequeue | 100 | 4,150.14ns | 24.2443ns | 22.6781ns |
EnqueueBackwardsDequeue | 100 | 5,540.79ns | 8.4947ns | 7.9459ns |
EnqueueRandomDequeue | 100 | 4,952.87ns | 30.5417ns | 25.5037ns |
EnqueueUpdatePriority | 100 | 3,112.38ns | 2.0788ns | 1.9445ns |
EnqueueBackwardsUpdatePriority | 100 | 4,847.79ns | 4.9923ns | 4.4255ns |
EnqueueRandomUpdatePriority | 100 | 2,247.65ns | 1.7452ns | 1.6325ns |
EnqueueContains | 100 | 621.47ns | 0.1432ns | 0.1339ns |
Enqueue | 10000 | 43,712.94ns | 4.5534ns | 3.8023ns |
EnqueueBackwards | 10000 | 416,561.57ns | 60.7293ns | 56.8063ns |
EnqueueRandom | 10000 | 190,124.79ns | 80.3340ns | 75.1445ns |
EnqueueDequeue | 10000 | 1,044,064.23ns | 232.7742ns | 217.7371ns |
EnqueueBackwardsDequeue | 10000 | 1,328,398.46ns | 556.0566ns | 464.3324ns |
EnqueueRandomDequeue | 10000 | 1,505,869.37ns | 364.7999ns | 341.2341ns |
EnqueueUpdatePriority | 10000 | 734,316.95ns | 88.3724ns | 82.6636ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,087,657.92ns | 310.4105ns | 275.1708ns |
EnqueueRandomUpdatePriority | 10000 | 470,536.66ns | 87.7242ns | 77.7652ns |
EnqueueContains | 10000 | 61,959.12ns | 2.8381ns | 2.3699ns |
In my original testing this is where the biggest improvements came from. Here, it seems to have no effect. I have since learned that my original tests had the debugger running - I hadn't known that choosing Release mode wasn't enough to prevent that - and presumably that affected it somehow. Oh well, this rewrite doesn't hurt.
Change 11: As unrolling the first iteration in CascadeUp prevented some no-ops, I figured doing the same in CascadeDown should help too.
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 22.21ns | 0.0127ns | 0.0106ns |
EnqueueBackwards | 1 | 22.74ns | 0.0200ns | 0.0187ns |
EnqueueRandom | 1 | 22.60ns | 0.0092ns | 0.0081ns |
EnqueueDequeue | 1 | 22.73ns | 0.0850ns | 0.0795ns |
EnqueueBackwardsDequeue | 1 | 22.50ns | 0.0048ns | 0.0040ns |
EnqueueRandomDequeue | 1 | 23.36ns | 0.0050ns | 0.0042ns |
EnqueueUpdatePriority | 1 | 27.88ns | 0.0074ns | 0.0058ns |
EnqueueBackwardsUpdatePriority | 1 | 29.51ns | 0.0231ns | 0.0216ns |
EnqueueRandomUpdatePriority | 1 | 28.34ns | 0.0213ns | 0.0189ns |
EnqueueContains | 1 | 26.72ns | 0.0065ns | 0.0058ns |
Enqueue | 100 | 453.45ns | 1.3017ns | 1.2176ns |
EnqueueBackwards | 100 | 2,134.51ns | 8.4962ns | 7.5317ns |
EnqueueRandom | 100 | 1,052.76ns | 1.4396ns | 1.2762ns |
EnqueueDequeue | 100 | 4,245.47ns | 39.5863ns | 37.0290ns |
EnqueueBackwardsDequeue | 100 | 5,695.61ns | 31.1627ns | 29.1496ns |
EnqueueRandomDequeue | 100 | 4,775.98ns | 18.3707ns | 16.2852ns |
EnqueueUpdatePriority | 100 | 3,154.44ns | 7.7462ns | 6.8668ns |
EnqueueBackwardsUpdatePriority | 100 | 4,896.65ns | 6.6384ns | 6.2096ns |
EnqueueRandomUpdatePriority | 100 | 2,171.64ns | 3.2514ns | 2.7151ns |
EnqueueContains | 100 | 623.18ns | 0.3062ns | 0.2864ns |
Enqueue | 10000 | 43,641.93ns | 7.7952ns | 6.9102ns |
EnqueueBackwards | 10000 | 416,571.06ns | 89.7573ns | 83.9590ns |
EnqueueRandom | 10000 | 190,372.01ns | 148.7788ns | 139.1678ns |
EnqueueDequeue | 10000 | 1,056,882.89ns | 206.9015ns | 172.7721ns |
EnqueueBackwardsDequeue | 10000 | 1,338,313.66ns | 3,425.7467ns | 3,204.4453ns |
EnqueueRandomDequeue | 10000 | 1,400,901.57ns | 761.5125ns | 712.3192ns |
EnqueueUpdatePriority | 10000 | 732,245.29ns | 1,280.4280ns | 1,197.7130ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,099,620.03ns | 1,381.6871ns | 1,292.4308ns |
EnqueueRandomUpdatePriority | 10000 | 465,333.08ns | 135.2460ns | 119.8921ns |
EnqueueContains | 10000 | 61,698.93ns | 6.9668ns | 6.1759ns |
Most benchmarks show no significant change, but EnqueueRandomDequeue specifically - just that one, not the other two dequeue benchmarks - chopped 100 microseconds off its 1500 total. I don't know why this change would affect that specific case only, but I repeated the benchmark run a few times to be sure and it's consistent.
Change 12: Added inline attribute to Dequeue
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 22.19ns | 0.0511ns | 0.0478ns |
EnqueueBackwards | 1 | 22.58ns | 0.0804ns | 0.0752ns |
EnqueueRandom | 1 | 22.34ns | 0.0017ns | 0.0014ns |
EnqueueDequeue | 1 | 22.83ns | 0.0045ns | 0.0042ns |
EnqueueBackwardsDequeue | 1 | 22.69ns | 0.0039ns | 0.0032ns |
EnqueueRandomDequeue | 1 | 22.61ns | 0.0204ns | 0.0181ns |
EnqueueUpdatePriority | 1 | 27.56ns | 0.0194ns | 0.0172ns |
EnqueueBackwardsUpdatePriority | 1 | 27.61ns | 0.0601ns | 0.0562ns |
EnqueueRandomUpdatePriority | 1 | 28.84ns | 0.0279ns | 0.0247ns |
EnqueueContains | 1 | 27.02ns | 0.1429ns | 0.1336ns |
Enqueue | 100 | 454.76ns | 0.0709ns | 0.0592ns |
EnqueueBackwards | 100 | 2,122.60ns | 0.3536ns | 0.3135ns |
EnqueueRandom | 100 | 1,056.99ns | 0.9723ns | 0.9095ns |
EnqueueDequeue | 100 | 4,258.42ns | 5.1065ns | 4.2642ns |
EnqueueBackwardsDequeue | 100 | 5,698.16ns | 21.6832ns | 20.2825ns |
EnqueueRandomDequeue | 100 | 4,786.78ns | 31.3661ns | 29.3398ns |
EnqueueUpdatePriority | 100 | 3,197.65ns | 4.8257ns | 4.0297ns |
EnqueueBackwardsUpdatePriority | 100 | 4,940.45ns | 6.5178ns | 6.0967ns |
EnqueueRandomUpdatePriority | 100 | 2,158.86ns | 1.3417ns | 1.1894ns |
EnqueueContains | 100 | 621.63ns | 0.1419ns | 0.1258ns |
Enqueue | 10000 | 44,076.62ns | 211.6200ns | 197.9495ns |
EnqueueBackwards | 10000 | 416,918.17ns | 934.0414ns | 779.9668ns |
EnqueueRandom | 10000 | 189,972.81ns | 122.5352ns | 114.6195ns |
EnqueueDequeue | 10000 | 1,052,608.52ns | 306.3106ns | 286.5231ns |
EnqueueBackwardsDequeue | 10000 | 1,333,331.83ns | 268.2575ns | 250.9282ns |
EnqueueRandomDequeue | 10000 | 1,397,409.25ns | 534.1147ns | 499.6111ns |
EnqueueUpdatePriority | 10000 | 737,059.19ns | 699.8656ns | 506.0487ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,096,362.36ns | 1,414.3376ns | 1,322.9721ns |
EnqueueRandomUpdatePriority | 10000 | 467,614.67ns | 298.2784ns | 264.4160ns |
EnqueueContains | 10000 | 61,756.56ns | 4.6587ns | 4.3578ns |
No meaningful effect on anything. I'm guessing this is because Dequeue was small enough to be inlined anyway. I reverted at this point, but revisited the idea later.
Change 13: Factor out the use of Swap in Remove
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 22.41ns | 0.0039ns | 0.0030ns |
EnqueueBackwards | 1 | 22.38ns | 0.0075ns | 0.0067ns |
EnqueueRandom | 1 | 23.33ns | 0.0209ns | 0.0175ns |
EnqueueDequeue | 1 | 22.77ns | 0.0026ns | 0.0023ns |
EnqueueBackwardsDequeue | 1 | 23.01ns | 0.0124ns | 0.0116ns |
EnqueueRandomDequeue | 1 | 22.99ns | 0.1034ns | 0.0916ns |
EnqueueUpdatePriority | 1 | 28.33ns | 0.0090ns | 0.0080ns |
EnqueueBackwardsUpdatePriority | 1 | 27.68ns | 0.0383ns | 0.0339ns |
EnqueueRandomUpdatePriority | 1 | 28.38ns | 0.0094ns | 0.0088ns |
EnqueueContains | 1 | 26.55ns | 0.0119ns | 0.0112ns |
Enqueue | 100 | 452.81ns | 0.0986ns | 0.0874ns |
EnqueueBackwards | 100 | 2,122.19ns | 0.2572ns | 0.2280ns |
EnqueueRandom | 100 | 1,063.52ns | 5.2930ns | 4.9510ns |
EnqueueDequeue | 100 | 3,869.74ns | 22.1228ns | 20.6937ns |
EnqueueBackwardsDequeue | 100 | 5,246.61ns | 10.3509ns | 9.6823ns |
EnqueueRandomDequeue | 100 | 4,529.24ns | 23.6071ns | 22.0821ns |
EnqueueUpdatePriority | 100 | 3,159.93ns | 4.5902ns | 4.2937ns |
EnqueueBackwardsUpdatePriority | 100 | 4,916.10ns | 19.3294ns | 18.0807ns |
EnqueueRandomUpdatePriority | 100 | 2,163.47ns | 1.6592ns | 1.4709ns |
EnqueueContains | 100 | 642.07ns | 0.4308ns | 0.3363ns |
Enqueue | 10000 | 43,712.68ns | 4.9400ns | 3.5719ns |
EnqueueBackwards | 10000 | 416,549.02ns | 99.3137ns | 88.0390ns |
EnqueueRandom | 10000 | 190,394.21ns | 61.8104ns | 48.2575ns |
EnqueueDequeue | 10000 | 972,646.87ns | 176.5460ns | 165.1412ns |
EnqueueBackwardsDequeue | 10000 | 1,272,651.01ns | 5,125.3539ns | 4,794.2588ns |
EnqueueRandomDequeue | 10000 | 1,444,087.75ns | 249.3100ns | 233.2047ns |
EnqueueUpdatePriority | 10000 | 731,074.45ns | 967.0588ns | 857.2724ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,099,364.99ns | 4,494.7224ns | 4,204.3657ns |
EnqueueRandomUpdatePriority | 10000 | 465,234.02ns | 593.4637ns | 555.1262ns |
EnqueueContains | 10000 | 61,614.84ns | 19.3570ns | 16.1640ns |
Two of the Dequeue cases improved a fair bit, the other got slower about the same amount.
Change 14: Added inline attribute to Remove
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 22.20ns | 0.0064ns | 0.0059ns |
EnqueueBackwards | 1 | 22.27ns | 0.0425ns | 0.0397ns |
EnqueueRandom | 1 | 22.31ns | 0.0167ns | 0.0148ns |
EnqueueDequeue | 1 | 23.86ns | 0.1307ns | 0.1222ns |
EnqueueBackwardsDequeue | 1 | 23.42ns | 0.0123ns | 0.0116ns |
EnqueueRandomDequeue | 1 | 23.50ns | 0.0278ns | 0.0260ns |
EnqueueUpdatePriority | 1 | 28.35ns | 0.0953ns | 0.0744ns |
EnqueueBackwardsUpdatePriority | 1 | 27.86ns | 0.1183ns | 0.1107ns |
EnqueueRandomUpdatePriority | 1 | 28.31ns | 0.0570ns | 0.0505ns |
EnqueueContains | 1 | 26.43ns | 0.0115ns | 0.0102ns |
Enqueue | 100 | 452.71ns | 0.2499ns | 0.2087ns |
EnqueueBackwards | 100 | 2,133.58ns | 7.2715ns | 6.8018ns |
EnqueueRandom | 100 | 1,076.99ns | 12.0644ns | 10.6948ns |
EnqueueDequeue | 100 | 3,597.61ns | 7.2484ns | 6.0527ns |
EnqueueBackwardsDequeue | 100 | 5,075.53ns | 9.8576ns | 9.2208ns |
EnqueueRandomDequeue | 100 | 4,539.90ns | 18.0326ns | 16.8677ns |
EnqueueUpdatePriority | 100 | 3,165.54ns | 5.7062ns | 5.3376ns |
EnqueueBackwardsUpdatePriority | 100 | 4,892.66ns | 3.6112ns | 3.2013ns |
EnqueueRandomUpdatePriority | 100 | 2,150.24ns | 1.4414ns | 1.3483ns |
EnqueueContains | 100 | 638.57ns | 0.7877ns | 0.7368ns |
Enqueue | 10000 | 43,705.85ns | 8.0526ns | 6.7242ns |
EnqueueBackwards | 10000 | 416,548.27ns | 98.8718ns | 92.4847ns |
EnqueueRandom | 10000 | 190,127.46ns | 138.7274ns | 122.9783ns |
EnqueueDequeue | 10000 | 951,685.61ns | 478.2353ns | 447.3416ns |
EnqueueBackwardsDequeue | 10000 | 1,232,614.45ns | 1,005.2223ns | 940.2854ns |
EnqueueRandomDequeue | 10000 | 1,428,960.32ns | 243.0029ns | 202.9184ns |
EnqueueUpdatePriority | 10000 | 736,639.75ns | 840.5913ns | 745.1622ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,099,487.39ns | 773.1950ns | 603.6598ns |
EnqueueRandomUpdatePriority | 10000 | 465,429.96ns | 328.5549ns | 307.3304ns |
EnqueueContains | 10000 | 61,765.65ns | 11.0914ns | 9.8323ns |
All Dequeue cases improved just a little bit.
Change 15: Copy implementation of Remove into Dequeue and edit for the special case of the removed node always being the head
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 22.58ns | 0.0145ns | 0.0121ns |
EnqueueBackwards | 1 | 22.64ns | 0.0070ns | 0.0058ns |
EnqueueRandom | 1 | 22.34ns | 0.0037ns | 0.0033ns |
EnqueueDequeue | 1 | 23.19ns | 0.0067ns | 0.0063ns |
EnqueueBackwardsDequeue | 1 | 23.40ns | 0.0302ns | 0.0283ns |
EnqueueRandomDequeue | 1 | 23.43ns | 0.0311ns | 0.0291ns |
EnqueueUpdatePriority | 1 | 27.64ns | 0.0079ns | 0.0070ns |
EnqueueBackwardsUpdatePriority | 1 | 27.40ns | 0.0056ns | 0.0053ns |
EnqueueRandomUpdatePriority | 1 | 28.92ns | 0.0147ns | 0.0137ns |
EnqueueContains | 1 | 26.80ns | 0.0081ns | 0.0071ns |
Enqueue | 100 | 452.84ns | 0.1059ns | 0.0884ns |
EnqueueBackwards | 100 | 2,122.45ns | 0.4275ns | 0.3790ns |
EnqueueRandom | 100 | 1,067.15ns | 0.6758ns | 0.5644ns |
EnqueueDequeue | 100 | 3,768.56ns | 15.8371ns | 14.8141ns |
EnqueueBackwardsDequeue | 100 | 5,158.14ns | 8.1800ns | 6.3864ns |
EnqueueRandomDequeue | 100 | 4,100.04ns | 29.4456ns | 27.5434ns |
EnqueueUpdatePriority | 100 | 3,164.30ns | 7.6815ns | 7.1852ns |
EnqueueBackwardsUpdatePriority | 100 | 4,901.39ns | 1.9473ns | 1.7262ns |
EnqueueRandomUpdatePriority | 100 | 2,185.92ns | 7.0964ns | 6.2907ns |
EnqueueContains | 100 | 627.28ns | 0.1283ns | 0.1071ns |
Enqueue | 10000 | 43,796.43ns | 98.0872ns | 91.7508ns |
EnqueueBackwards | 10000 | 416,528.04ns | 107.6042ns | 100.6530ns |
EnqueueRandom | 10000 | 190,775.81ns | 801.9965ns | 750.1879ns |
EnqueueDequeue | 10000 | 1,015,905.68ns | 258.4723ns | 215.8361ns |
EnqueueBackwardsDequeue | 10000 | 1,306,963.43ns | 2,702.7546ns | 2,528.1581ns |
EnqueueRandomDequeue | 10000 | 1,404,379.56ns | 824.1601ns | 770.9198ns |
EnqueueUpdatePriority | 10000 | 734,222.67ns | 763.8830ns | 677.1623ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,094,132.17ns | 262.4373ns | 245.4840ns |
EnqueueRandomUpdatePriority | 10000 | 467,447.01ns | 2,503.1644ns | 2,341.4613ns |
EnqueueContains | 10000 | 61,729.44ns | 10.0395ns | 8.8998ns |
Two Dequeue cases got worse, one got better. Next few changes investigate a bit more.
Change 16: Add AggressiveInlining to Dequeue again
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 22.20ns | 0.0754ns | 0.0705ns |
EnqueueBackwards | 1 | 22.21ns | 0.0891ns | 0.0833ns |
EnqueueRandom | 1 | 22.72ns | 0.0065ns | 0.0061ns |
EnqueueDequeue | 1 | 22.70ns | 0.0079ns | 0.0066ns |
EnqueueBackwardsDequeue | 1 | 22.41ns | 0.0059ns | 0.0052ns |
EnqueueRandomDequeue | 1 | 23.49ns | 0.0358ns | 0.0335ns |
EnqueueUpdatePriority | 1 | 28.12ns | 0.0078ns | 0.0073ns |
EnqueueBackwardsUpdatePriority | 1 | 27.42ns | 0.0078ns | 0.0065ns |
EnqueueRandomUpdatePriority | 1 | 29.04ns | 0.0238ns | 0.0211ns |
EnqueueContains | 1 | 26.94ns | 0.0086ns | 0.0071ns |
Enqueue | 100 | 455.12ns | 0.0539ns | 0.0390ns |
EnqueueBackwards | 100 | 2,124.73ns | 0.3888ns | 0.3447ns |
EnqueueRandom | 100 | 1,065.69ns | 0.5999ns | 0.5318ns |
EnqueueDequeue | 100 | 3,394.18ns | 6.1180ns | 5.7228ns |
EnqueueBackwardsDequeue | 100 | 4,860.40ns | 16.5571ns | 15.4875ns |
EnqueueRandomDequeue | 100 | 4,015.70ns | 46.3825ns | 43.3862ns |
EnqueueUpdatePriority | 100 | 3,160.21ns | 7.9729ns | 7.4578ns |
EnqueueBackwardsUpdatePriority | 100 | 4,907.65ns | 18.7129ns | 17.5041ns |
EnqueueRandomUpdatePriority | 100 | 2,160.94ns | 3.4017ns | 3.0155ns |
EnqueueContains | 100 | 624.85ns | 0.1165ns | 0.1089ns |
Enqueue | 10000 | 44,093.14ns | 228.7990ns | 214.0187ns |
EnqueueBackwards | 10000 | 416,573.89ns | 223.5962ns | 186.7130ns |
EnqueueRandom | 10000 | 190,456.26ns | 278.5402ns | 246.9186ns |
EnqueueDequeue | 10000 | 943,300.79ns | 158.0329ns | 140.0920ns |
EnqueueBackwardsDequeue | 10000 | 1,233,626.37ns | 83.6337ns | 74.1391ns |
EnqueueRandomDequeue | 10000 | 1,362,721.59ns | 622.2356ns | 519.5948ns |
EnqueueUpdatePriority | 10000 | 733,974.11ns | 603.8905ns | 564.8795ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,101,374.95ns | 1,922.2804ns | 1,605.1911ns |
EnqueueRandomUpdatePriority | 10000 | 465,139.68ns | 601.7778ns | 502.5117ns |
EnqueueContains | 10000 | 61,766.98ns | 9.3360ns | 7.7960ns |
All Dequeue cases improved significantly, all three to a point better than at Change 14.
Change 17: Tried reverting Change 15
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 22.55ns | 0.0073ns | 0.0065ns |
EnqueueBackwards | 1 | 22.25ns | 0.0053ns | 0.0047ns |
EnqueueRandom | 1 | 22.34ns | 0.0042ns | 0.0038ns |
EnqueueDequeue | 1 | 23.81ns | 0.0598ns | 0.0560ns |
EnqueueBackwardsDequeue | 1 | 23.43ns | 0.0137ns | 0.0114ns |
EnqueueRandomDequeue | 1 | 23.75ns | 0.0982ns | 0.0918ns |
EnqueueUpdatePriority | 1 | 27.91ns | 0.0053ns | 0.0045ns |
EnqueueBackwardsUpdatePriority | 1 | 27.53ns | 0.0096ns | 0.0090ns |
EnqueueRandomUpdatePriority | 1 | 28.20ns | 0.0099ns | 0.0093ns |
EnqueueContains | 1 | 27.04ns | 0.0110ns | 0.0103ns |
Enqueue | 100 | 453.03ns | 0.1215ns | 0.0949ns |
EnqueueBackwards | 100 | 2,131.89ns | 7.4771ns | 6.9941ns |
EnqueueRandom | 100 | 1,063.06ns | 11.4048ns | 10.6681ns |
EnqueueDequeue | 100 | 3,665.13ns | 11.9238ns | 10.5701ns |
EnqueueBackwardsDequeue | 100 | 5,100.35ns | 11.2050ns | 10.4812ns |
EnqueueRandomDequeue | 100 | 4,456.91ns | 36.2898ns | 33.9455ns |
EnqueueUpdatePriority | 100 | 3,172.86ns | 11.5497ns | 10.8036ns |
EnqueueBackwardsUpdatePriority | 100 | 4,910.67ns | 12.0718ns | 10.7013ns |
EnqueueRandomUpdatePriority | 100 | 2,174.75ns | 9.1068ns | 8.5185ns |
EnqueueContains | 100 | 621.31ns | 0.0842ns | 0.0703ns |
Enqueue | 10000 | 43,682.69ns | 15.6044ns | 13.8329ns |
EnqueueBackwards | 10000 | 416,523.28ns | 76.2750ns | 55.1518ns |
EnqueueRandom | 10000 | 189,990.37ns | 141.6339ns | 118.2707ns |
EnqueueDequeue | 10000 | 951,306.17ns | 294.1230ns | 260.7323ns |
EnqueueBackwardsDequeue | 10000 | 1,232,584.59ns | 1,224.8384ns | 1,145.7145ns |
EnqueueRandomDequeue | 10000 | 1,435,426.26ns | 278.4670ns | 260.4782ns |
EnqueueUpdatePriority | 10000 | 730,930.38ns | 309.6355ns | 289.6332ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,097,982.42ns | 561.8409ns | 525.5463ns |
EnqueueRandomUpdatePriority | 10000 | 465,169.88ns | 468.5706ns | 438.3012ns |
EnqueueContains | 10000 | 61,781.12ns | 7.3693ns | 6.5327ns |
EnqueueRandomDequeue got significantly worse, everything else stayed about the same. Seems inlining all the way is the winner on this one, so I went back to Change 16.
Change 18: Replace division with bit shift on OnNodeUpdated
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 25.08ns | 0.0619ns | 0.0579ns |
EnqueueBackwards | 1 | 22.15ns | 0.0033ns | 0.0030ns |
EnqueueRandom | 1 | 22.42ns | 0.0427ns | 0.0399ns |
EnqueueDequeue | 1 | 22.76ns | 0.0095ns | 0.0084ns |
EnqueueBackwardsDequeue | 1 | 22.61ns | 0.0202ns | 0.0169ns |
EnqueueRandomDequeue | 1 | 23.58ns | 0.0666ns | 0.0623ns |
EnqueueUpdatePriority | 1 | 27.10ns | 0.0060ns | 0.0050ns |
EnqueueBackwardsUpdatePriority | 1 | 27.63ns | 0.0074ns | 0.0062ns |
EnqueueRandomUpdatePriority | 1 | 27.41ns | 0.0085ns | 0.0075ns |
EnqueueContains | 1 | 27.27ns | 0.0561ns | 0.0525ns |
Enqueue | 100 | 452.98ns | 0.0494ns | 0.0438ns |
EnqueueBackwards | 100 | 2,122.56ns | 0.4487ns | 0.3977ns |
EnqueueRandom | 100 | 1,069.15ns | 0.9815ns | 0.7663ns |
EnqueueDequeue | 100 | 3,392.39ns | 11.4523ns | 10.7125ns |
EnqueueBackwardsDequeue | 100 | 4,875.53ns | 9.2865ns | 8.2323ns |
EnqueueRandomDequeue | 100 | 3,978.40ns | 41.7402ns | 39.0438ns |
EnqueueUpdatePriority | 100 | 3,119.82ns | 16.0411ns | 15.0048ns |
EnqueueBackwardsUpdatePriority | 100 | 4,878.08ns | 15.0893ns | 14.1145ns |
EnqueueRandomUpdatePriority | 100 | 2,053.08ns | 6.6323ns | 5.8794ns |
EnqueueContains | 100 | 622.32ns | 0.1860ns | 0.1739ns |
Enqueue | 10000 | 43,632.70ns | 3.6509ns | 3.2364ns |
EnqueueBackwards | 10000 | 417,286.99ns | 435.7954ns | 407.6432ns |
EnqueueRandom | 10000 | 190,206.68ns | 102.9975ns | 91.3046ns |
EnqueueDequeue | 10000 | 945,175.22ns | 283.3797ns | 204.9021ns |
EnqueueBackwardsDequeue | 10000 | 1,233,667.01ns | 1,110.0706ns | 984.0486ns |
EnqueueRandomDequeue | 10000 | 1,357,614.41ns | 261.9136ns | 244.9942ns |
EnqueueUpdatePriority | 10000 | 720,677.74ns | 245.6859ns | 217.7941ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,086,414.26ns | 297.1541ns | 277.9581ns |
EnqueueRandomUpdatePriority | 10000 | 450,523.78ns | 602.1603ns | 563.2611ns |
EnqueueContains | 10000 | 61,791.09ns | 9.5761ns | 8.4889ns |
EnqueueRandomDequeue got a noticeable amount better, everything else stayed about the same.
Change 19: Add AggressiveInlining to OnNodeUpdated. Incidentally also added enumeration benchmark to prepare for next change.
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 22.64ns | 0.0162ns | 0.0143ns |
EnqueueBackwards | 1 | 22.09ns | 0.0032ns | 0.0028ns |
EnqueueRandom | 1 | 22.29ns | 0.0060ns | 0.0056ns |
EnqueueDequeue | 1 | 22.61ns | 0.0084ns | 0.0079ns |
EnqueueBackwardsDequeue | 1 | 23.02ns | 0.0179ns | 0.0159ns |
EnqueueRandomDequeue | 1 | 22.87ns | 0.0053ns | 0.0044ns |
EnqueueUpdatePriority | 1 | 28.13ns | 0.0072ns | 0.0064ns |
EnqueueBackwardsUpdatePriority | 1 | 29.07ns | 0.0392ns | 0.0327ns |
EnqueueRandomUpdatePriority | 1 | 29.23ns | 0.0364ns | 0.0341ns |
EnqueueContains | 1 | 27.43ns | 0.0361ns | 0.0320ns |
EnqueueEnumerator | 1 | 42.12ns | 0.0810ns | 0.0718ns |
Enqueue | 100 | 452.42ns | 0.1000ns | 0.0887ns |
EnqueueBackwards | 100 | 2,134.12ns | 6.2371ns | 5.2083ns |
EnqueueRandom | 100 | 1,089.93ns | 0.4355ns | 0.3400ns |
EnqueueDequeue | 100 | 3,378.44ns | 11.9751ns | 11.2015ns |
EnqueueBackwardsDequeue | 100 | 4,865.16ns | 17.1292ns | 16.0226ns |
EnqueueRandomDequeue | 100 | 3,996.34ns | 20.3571ns | 16.9991ns |
EnqueueUpdatePriority | 100 | 2,988.75ns | 4.5433ns | 4.2498ns |
EnqueueBackwardsUpdatePriority | 100 | 4,653.77ns | 19.5419ns | 18.2795ns |
EnqueueRandomUpdatePriority | 100 | 1,962.13ns | 11.9164ns | 11.1466ns |
EnqueueContains | 100 | 639.35ns | 0.3472ns | 0.2711ns |
EnqueueEnumerator | 100 | 1,208.26ns | 0.3471ns | 0.2898ns |
Enqueue | 10000 | 43,884.01ns | 175.5262ns | 164.1873ns |
EnqueueBackwards | 10000 | 416,491.99ns | 123.6652ns | 103.2661ns |
EnqueueRandom | 10000 | 190,841.06ns | 779.2336ns | 690.7702ns |
EnqueueDequeue | 10000 | 944,904.45ns | 360.7183ns | 301.2161ns |
EnqueueBackwardsDequeue | 10000 | 1,236,235.25ns | 446.4808ns | 372.8317ns |
EnqueueRandomDequeue | 10000 | 1,360,890.48ns | 266.0516ns | 222.1651ns |
EnqueueUpdatePriority | 10000 | 709,574.40ns | 138.3995ns | 129.4590ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,070,299.20ns | 658.2688ns | 513.9329ns |
EnqueueRandomUpdatePriority | 10000 | 431,503.22ns | 68.7147ns | 60.9138ns |
EnqueueContains | 10000 | 61,852.94ns | 13.0878ns | 11.6020ns |
EnqueueEnumerator | 10000 | 116,744.77ns | 103.1914ns | 96.5253ns |
All Update benchmarks improved slightly.
Change 20: Changed GetEnumerator to use ArraySegment instead of yield.
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 22.23ns | 0.0110ns | 0.0103ns |
EnqueueBackwards | 1 | 23.06ns | 0.0070ns | 0.0051ns |
EnqueueRandom | 1 | 22.45ns | 0.0321ns | 0.0251ns |
EnqueueDequeue | 1 | 22.54ns | 0.0072ns | 0.0056ns |
EnqueueBackwardsDequeue | 1 | 22.32ns | 0.0178ns | 0.0167ns |
EnqueueRandomDequeue | 1 | 23.51ns | 0.0280ns | 0.0234ns |
EnqueueUpdatePriority | 1 | 28.01ns | 0.0098ns | 0.0092ns |
EnqueueBackwardsUpdatePriority | 1 | 29.57ns | 0.0138ns | 0.0129ns |
EnqueueRandomUpdatePriority | 1 | 29.06ns | 0.0229ns | 0.0214ns |
EnqueueContains | 1 | 26.54ns | 0.0089ns | 0.0079ns |
EnqueueEnumerator | 1 | 64.02ns | 0.4459ns | 0.4171ns |
Enqueue | 100 | 453.81ns | 1.1911ns | 0.9946ns |
EnqueueBackwards | 100 | 2,131.42ns | 0.4016ns | 0.3756ns |
EnqueueRandom | 100 | 1,065.43ns | 0.9738ns | 0.7603ns |
EnqueueDequeue | 100 | 3,385.99ns | 7.8772ns | 7.3683ns |
EnqueueBackwardsDequeue | 100 | 4,860.17ns | 10.4015ns | 9.2206ns |
EnqueueRandomDequeue | 100 | 4,001.53ns | 39.6598ns | 37.0978ns |
EnqueueUpdatePriority | 100 | 2,990.32ns | 5.1090ns | 4.7789ns |
EnqueueBackwardsUpdatePriority | 100 | 4,653.15ns | 2.1216ns | 1.7716ns |
EnqueueRandomUpdatePriority | 100 | 1,952.45ns | 3.6957ns | 3.4569ns |
EnqueueContains | 100 | 639.73ns | 1.9278ns | 1.8033ns |
EnqueueEnumerator | 100 | 1,059.31ns | 0.2522ns | 0.2359ns |
Enqueue | 10000 | 43,798.96ns | 90.1664ns | 84.3417ns |
EnqueueBackwards | 10000 | 416,606.77ns | 89.6748ns | 83.8819ns |
EnqueueRandom | 10000 | 191,248.68ns | 1,050.4011ns | 982.5457ns |
EnqueueDequeue | 10000 | 945,363.62ns | 610.2522ns | 570.8302ns |
EnqueueBackwardsDequeue | 10000 | 1,235,957.01ns | 641.8760ns | 535.9955ns |
EnqueueRandomDequeue | 10000 | 1,361,838.79ns | 905.5018ns | 802.7037ns |
EnqueueUpdatePriority | 10000 | 708,129.59ns | 312.1136ns | 260.6290ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,070,550.73ns | 869.4400ns | 770.7358ns |
EnqueueRandomUpdatePriority | 10000 | 434,412.61ns | 259.0009ns | 216.2774ns |
EnqueueContains | 10000 | 61,815.88ns | 26.7781ns | 25.0482ns |
EnqueueEnumerator | 10000 | 99,696.08ns | 10.0313ns | 7.8317ns |
Enumeration improved significantly.
Change 21: Experimented with reverting Change 5.
Method | QueueSize | Mean | Error | StdDev |
---|---|---|---|---|
Enqueue | 1 | 23.75ns | 0.5055ns | 1.1095ns |
EnqueueBackwards | 1 | 37.97ns | 0.4551ns | 0.4257ns |
EnqueueRandom | 1 | 23.60ns | 0.0128ns | 0.0120ns |
EnqueueDequeue | 1 | 23.13ns | 0.0868ns | 0.0812ns |
EnqueueBackwardsDequeue | 1 | 23.36ns | 0.1488ns | 0.1319ns |
EnqueueRandomDequeue | 1 | 23.54ns | 0.0049ns | 0.0041ns |
EnqueueUpdatePriority | 1 | 28.93ns | 0.1532ns | 0.1279ns |
EnqueueBackwardsUpdatePriority | 1 | 28.85ns | 0.0724ns | 0.0605ns |
EnqueueRandomUpdatePriority | 1 | 29.48ns | 0.0309ns | 0.0274ns |
EnqueueContains | 1 | 38.01ns | 0.3712ns | 0.3100ns |
EnqueueEnumerator | 1 | 64.81ns | 0.4121ns | 0.3854ns |
Enqueue | 100 | 552.78ns | 41.4881ns | 119.0371ns |
EnqueueBackwards | 100 | 2,214.81ns | 6.4132ns | 5.3553ns |
EnqueueRandom | 100 | 1,157.01ns | 3.8655ns | 3.6158ns |
EnqueueDequeue | 100 | 3,714.41ns | 59.0365ns | 55.2228ns |
EnqueueBackwardsDequeue | 100 | 4,847.67ns | 29.8156ns | 27.8895ns |
EnqueueRandomDequeue | 100 | 4,022.34ns | 16.0133ns | 14.9789ns |
EnqueueUpdatePriority | 100 | 3,450.18ns | 66.5819ns | 132.9712ns |
EnqueueBackwardsUpdatePriority | 100 | 4,649.55ns | 10.9245ns | 9.6843ns |
EnqueueRandomUpdatePriority | 100 | 2,062.60ns | 3.0710ns | 2.8726ns |
EnqueueContains | 100 | 1,367.62ns | 25.3361ns | 23.6994ns |
EnqueueEnumerator | 100 | 1,134.59ns | 13.2214ns | 12.3673ns |
Enqueue | 10000 | 45,575.17ns | 18.8716ns | 16.7292ns |
EnqueueBackwards | 10000 | 438,975.82ns | 118.8016ns | 111.1271ns |
EnqueueRandom | 10000 | 204,282.78ns | 535.1115ns | 500.5436ns |
EnqueueDequeue | 10000 | 944,927.39ns | 233.6584ns | 218.5642ns |
EnqueueBackwardsDequeue | 10000 | 1,231,355.00ns | 249.8413ns | 221.4778ns |
EnqueueRandomDequeue | 10000 | 1,378,809.44ns | 595.8720ns | 497.5801ns |
EnqueueUpdatePriority | 10000 | 683,503.96ns | 158.7658ns | 132.5766ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,074,625.59ns | 2,134.8339ns | 1,892.4745ns |
EnqueueRandomUpdatePriority | 10000 | 444,313.13ns | 283.2088ns | 221.1108ns |
EnqueueContains | 10000 | 62,308.63ns | 40.6640ns | 31.7478ns |
EnqueueEnumerator | 10000 | 99,805.43ns | 21.3896ns | 18.9613ns |
Nothing really conclusive, and I think Change 5 should be an improvement, so I put it back.
Final overall chart, with numbers side by side:
Method | QueueSize | Mean | Mean | Error | Error | StdDev | StdDev |
---|---|---|---|---|---|---|---|
Enqueue | 1 | 23.70ns | 22.23ns | 0.0054ns | 0.0110ns | 0.0048ns | 0.0103ns |
EnqueueBackwards | 1 | 23.49ns | 23.06ns | 0.0083ns | 0.0070ns | 0.0069ns | 0.0051ns |
EnqueueRandom | 1 | 24.57ns | 22.45ns | 0.0395ns | 0.0321ns | 0.0350ns | 0.0251ns |
EnqueueDequeue | 1 | 23.82ns | 22.54ns | 0.0041ns | 0.0072ns | 0.0038ns | 0.0056ns |
EnqueueBackwardsDequeue | 1 | 24.15ns | 22.32ns | 0.0423ns | 0.0178ns | 0.0395ns | 0.0167ns |
EnqueueRandomDequeue | 1 | 23.98ns | 23.51ns | 0.0183ns | 0.0280ns | 0.0163ns | 0.0234ns |
EnqueueUpdatePriority | 1 | 31.43ns | 28.01ns | 0.0102ns | 0.0098ns | 0.0096ns | 0.0092ns |
EnqueueBackwardsUpdatePriority | 1 | 31.78ns | 29.57ns | 0.0083ns | 0.0138ns | 0.0073ns | 0.0129ns |
EnqueueRandomUpdatePriority | 1 | 34.45ns | 29.06ns | 0.0161ns | 0.0229ns | 0.0134ns | 0.0214ns |
EnqueueContains | 1 | 29.11ns | 26.54ns | 0.0798ns | 0.0089ns | 0.0667ns | 0.0079ns |
Enqueue | 100 | 699.55ns | 453.81ns | 0.0668ns | 1.1911ns | 0.0592ns | 0.9946ns |
EnqueueBackwards | 100 | 3,375.54ns | 2,131.42ns | 16.4371ns | 0.4016ns | 14.5710ns | 0.3756ns |
EnqueueRandom | 100 | 1,596.11ns | 1,065.43ns | 1.6911ns | 0.9738ns | 1.4122ns | 0.7603ns |
EnqueueDequeue | 100 | 4,412.84ns | 3,385.99ns | 22.2034ns | 7.8772ns | 20.7691ns | 7.3683ns |
EnqueueBackwardsDequeue | 100 | 6,969.57ns | 4,860.17ns | 10.8882ns | 10.4015ns | 9.6521ns | 9.2206ns |
EnqueueRandomDequeue | 100 | 5,277.03ns | 4,001.53ns | 24.9260ns | 39.6598ns | 23.3158ns | 37.0978ns |
EnqueueUpdatePriority | 100 | 4,012.80ns | 2,990.32ns | 9.5671ns | 5.1090ns | 8.9490ns | 4.7789ns |
EnqueueBackwardsUpdatePriority | 100 | 6,710.78ns | 4,653.15ns | 15.7761ns | 2.1216ns | 14.7570ns | 1.7716ns |
EnqueueRandomUpdatePriority | 100 | 2,903.03ns | 1,952.45ns | 8.9722ns | 3.6957ns | 8.3926ns | 3.4569ns |
EnqueueContains | 100 | 847.97ns | 639.73ns | 1.6528ns | 1.9278ns | 1.5460ns | 1.8033ns |
Enqueue | 10000 | 68,439.80ns | 43,798.96ns | 17.5800ns | 90.1664ns | 14.6801ns | 84.3417ns |
EnqueueBackwards | 10000 | 751,837.87ns | 416,606.77ns | 294.1160ns | 89.6748ns | 275.1162ns | 83.8819ns |
EnqueueRandom | 10000 | 261,665.52ns | 191,248.68ns | 109.0163ns | 1,050.4011ns | 96.6401ns | 982.5457ns |
EnqueueDequeue | 10000 | 1,063,948.43ns | 945,363.62ns | 208.7239ns | 610.2522ns | 195.2404ns | 570.8302ns |
EnqueueBackwardsDequeue | 10000 | 1,665,493.35ns | 1,235,957.01ns | 1,482.7118ns | 641.8760ns | 1,314.3852ns | 535.9955ns |
EnqueueRandomDequeue | 10000 | 1,582,293.56ns | 1,361,838.79ns | 213.6763ns | 905.5018ns | 189.4185ns | 802.7037ns |
EnqueueUpdatePriority | 10000 | 930,500.69ns | 708,129.59ns | 318.6172ns | 312.1136ns | 282.4459ns | 260.6290ns |
EnqueueBackwardsUpdatePriority | 10000 | 1,590,628.87ns | 1,070,550.73ns | 504.8412ns | 869.4400ns | 421.5652ns | 770.7358ns |
EnqueueRandomUpdatePriority | 10000 | 567,553.84ns | 434,412.61ns | 81.6522ns | 259.0009ns | 63.7486ns | 216.2774ns |
EnqueueContains | 10000 | 84,466.75ns | 61,815.88ns | 19.4158ns | 26.7781ns | 16.2131ns | 25.0482ns |
That is absolutely amazing! I will take a closer look at this soon (probably this weekend)
I have since learned that my original tests had the debugger running - I hadn't known that choosing Release mode wasn't enough to prevent that
Yes, you need to run all benchmarks outside of VS. I should have mentioned that, whoops.
Added AggressiveInlining to Dequeue, Remove, OnNodeUpdated
I will probably remove these. You may see slight increases in benchmarks, but in the real-world these could drastically increase the program size in memory, hurting performance. In general you should avoid forcing inlining for non-trivial public methods.
I'll make the above changes when I find time, but for now I'm going to merge this so that if anyone else wants to open a PR, they don't make changes that conflict with this. It will be a while before I push a new version, as I want to complete the other tasks under 'issues' first.
Nice work!
These changes remove many repetitions of low level operations in frequently executed parts of the code, for significant performance gains. Measured performance gains include 10% update speed, 30% dequeue speed, and 28% remove speed on the fast queue, and 8.5% update speed, 25% contains speed, 9% dequeue speed, 25% remove speed, and smaller improvement in enqueue speed on the simple queue. Fast queue optimizations are also copied into the stable and generic queues for similar gains. Also, resize is nearly 4 times as fast by using a built-in array copy instead of a manual loop.
I didn't see any performance tests in the project, so I made my own for local testing as follows: Fast queue, set size to 1000000, and create and store exactly that many basic nodes with just a string value.
Simple queue, create and store 1000000
new object()
data itemsI made no attempt to account for the overhead cost of the test loops and random number generation.
Result times in milliseconds: Fast queue:
Simple queue:
The decrease in fast queue enqueue speed is within random variation for the many times I ran these tests while making these changes. Fast queue contains is unaffected, all other operations are significantly faster.
By far the largest single improvement in the fast queue, besides the trivial resize change, was the rewrite of CascadeDown. Reducing the number of operations in Remove (and in the copy of it I put in Dequeue so I could remove another comparison or two) was a strong runner up, and several other changes had small but noticeable effects.
The simple queue benefited from the same changes through its wrapped GenericPriorityQueue, and also was significantly improved by various rewrites that reduce the number of times an object's hash is calculated and looked up in a dictionary, and dropping the calculation of pointless intermediate results in several places.
A further few percent improvement in most fast queue operations could be achieved by changing the priority type from
float
toint
, or adding a copy of the fast queue with that difference - the generic queue won't do for this because the overhead of using a Comparer is too large even if the stability is removed - but I restricted myself to changes not visible to users.