BlueRaja / High-Speed-Priority-Queue-for-C-Sharp

A C# priority queue optimized for pathfinding applications
MIT License
1.15k stars 169 forks source link

Many various optimizations #30

Closed dougmill closed 7 years ago

dougmill commented 7 years ago

These changes remove many repetitions of low level operations in frequently executed parts of the code, for significant performance gains. Measured performance gains include 10% update speed, 30% dequeue speed, and 28% remove speed on the fast queue, and 8.5% update speed, 25% contains speed, 9% dequeue speed, 25% remove speed, and smaller improvement in enqueue speed on the simple queue. Fast queue optimizations are also copied into the stable and generic queues for similar gains. Also, resize is nearly 4 times as fast by using a built-in array copy instead of a manual loop.

I didn't see any performance tests in the project, so I made my own for local testing as follows: Fast queue, set size to 1000000, and create and store exactly that many basic nodes with just a string value.

Simple queue, create and store 1000000 new object() data items

I made no attempt to account for the overhead cost of the test loops and random number generation.

Result times in milliseconds: Fast queue:

Operation Time Before Time After
Enqueue 50 51
Update 96 87
Contains 6 6
Resize 409 106
Dequeue 992 762
Remove 147 115

Simple queue:

Operation Time Before Time After
Enqueue 1080 1063
Update 419 386
Contains 216 172
Dequeue 1575 1442
EnqueueWithoutDuplicates 1143 1064
Remove 597 478

The decrease in fast queue enqueue speed is within random variation for the many times I ran these tests while making these changes. Fast queue contains is unaffected, all other operations are significantly faster.

By far the largest single improvement in the fast queue, besides the trivial resize change, was the rewrite of CascadeDown. Reducing the number of operations in Remove (and in the copy of it I put in Dequeue so I could remove another comparison or two) was a strong runner up, and several other changes had small but noticeable effects.

The simple queue benefited from the same changes through its wrapped GenericPriorityQueue, and also was significantly improved by various rewrites that reduce the number of times an object's hash is calculated and looked up in a dictionary, and dropping the calculation of pointless intermediate results in several places.

A further few percent improvement in most fast queue operations could be achieved by changing the priority type from float to int, or adding a copy of the fast queue with that difference - the generic queue won't do for this because the overhead of using a Comparer is too large even if the stability is removed - but I restricted myself to changes not visible to users.

BlueRaja commented 7 years ago

Awesome, great work!

It will take me a while to review this because I will need to see which changes actually make a difference in the JIT output. For example, I highly doubt nodeIndex / 2 to nodeIndex >> 1 actually changes anything.

How did you get a 25% improvement to Contains? I can't imagine doing anything simpler, and it seems like you haven't changed it at all.

dougmill commented 7 years ago

I was recently surprised in another project to learn that the return value can differ between someInt / 2 and someInt >> 1 - they round negative numbers differently. So, I don't know for certain that it will make a difference, but it seems plausible.

On contains, that improvement is only in the simple queue, I didn't see any possible improvement in the fast queue version.

For simple queue contains, my change drops the following operations: When item is null, picking _nullNodesCache[0] or null and then comparing it to null. In other cases, fetching the list of nodes (rather than just checking one exists), checking its count, picking nodes[0] or null based on the count, and comparing the result to null.

Skipping the count check is safe because any time a list is reduced to empty it is also removed from the dictionary, so there will never be an empty list in there. And apparently I neglected to apply that in GetExistingNode, so the final line of that should be changed to just return nodes[0]; in addition to the changes already here.

BlueRaja commented 7 years ago

Oh I see, I should have looked at the results closer. Yes I didn't spend a lot of time optimizing the SimpleQueue. I'll gladly merge these changes once I've vetted them. Please give me a week or two, I'm extremely busy recently. Thanks!

dougmill commented 7 years ago

While you're at it, please add a version of fast queue with int type priority - the project I'm using this for produces int priority values naturally, and integer comparisons are faster than floating point. I tried it out without committing the change, and measured a low to mid single digit percent improvement, I don't remember exactly how much, in everything but Contains.

BlueRaja commented 7 years ago

I was seeing slowdowns in my pathfinding benchmarks, so I've added tests to benchmark individual Priority Queue methods (pull from master to see them).

Enqueue seems to be ~17% faster all around, but UpdatePriority is ~6% slower on large datasets, and Dequeue seems to be an order of magnitude slower (!?!). There is the possibility I screwed something up...

dougmill commented 7 years ago

I would guess you screwed up in the CascadeDown rewrite somehow. That would not affect enqueue, would be partially balanced by CascadeUp improvement in update and also be relatively small because random nodes are likely near the bottom, and have a very large effect on dequeue with nothing to balance it and always a long path to swap through.

BlueRaja commented 7 years ago

I was using your branch. I meant screwed up in the test.

dougmill commented 7 years ago

I just tried to build your new benchmarks, and got two errors:

1>CSC : error CS0006: Metadata file '..\packages\Microsoft.CodeAnalysis.Analyzers.1.1.0\analyzers\dotnet\cs\Microsoft.CodeAnalysis.Analyzers.dll' could not be found
1>CSC : error CS0006: Metadata file '..\packages\Microsoft.CodeAnalysis.Analyzers.1.1.0\analyzers\dotnet\cs\Microsoft.CodeAnalysis.CSharp.Analyzers.dll' could not be found
BlueRaja commented 7 years ago

Try uninstalling and reinstalling BenchmarkDotNet through NuGet

dougmill commented 7 years ago

Got it running, and duplicated your result. I think it has something to do with the IterationSetup bug you reported. I switched that method to be its own benchmark to give a baseline and then had all the others actually call it themselves, and dequeue became faster with my code.

Interestingly, the "just do the setup" benchmark is slower with my changes than without. I'm not entirely sure why. I thought it might be my change adding a little up front overhead that never gets compensated for because you're enqueueing in strict ascending priority order so it always bails out immediately, but reverting that didn't meaningfully help.

I might try applying one change at a time, as small as I can split them up, and running the benchmarks at each step, but honestly I'm really not that confident in these numbers in the first place. Something very strange is clearly going on because I don't see how my changes could possibly* affect the behavior of that bug, and it seems such a basic thing that they really shouldn't have missed it. So, this benchmarking tool spits out a lot of numbers and bits of analysis, but at this point I don't trust those numbers to be correct.

*On second thought, that bug might be sensitive to timing in some way. If so, it would explain how my changes can affect it, but that such a thing is even possible in this tool really does not help my confidence in its results.

BlueRaja commented 7 years ago

Well that's annoying, I assumed that such a popular benchmarking library would actually work correctly. I'll try rewriting the benchmarks to combine enqueuing + dequeuing tomorrow.

dougmill commented 7 years ago

Hmm, per the comments on your issue report, it's apparently a design issue with "Iteration" meaning something a bit higher level than one method invocation. I'll be doing some experimenting later.

I've tweaked your test cases a bit, because I think always doing things with Queue.First and never an arbitrary node, and always using strictly in order priorities, probably skew the behavior patterns. I'm currently running with this:

namespace Priority_Queue_Benchmarks
{
    public class Benchmarks
    {
        [Params(1, 100, 10000)]
        public int QueueSize;

        private FastPriorityQueue<FastPriorityQueueNode> Queue;
        private FastPriorityQueueNode[] Nodes;
        private int[] RandomPriorities;

        [GlobalSetup]
        public void GlobalSetup()
        {
            Queue = new FastPriorityQueue<FastPriorityQueueNode>(QueueSize * 2);
            Nodes = new FastPriorityQueueNode[QueueSize];
            Random rand = new Random();
            RandomPriorities = new int[QueueSize * 2];
            for (int i = 0; i < QueueSize * 2; i++)
            {
                RandomPriorities[i] = rand.Next(16777216); // constrain to range float can hold with no rounding
            }
        }

        [Benchmark]
        public void IterationSetup()
        {
            Queue.Clear();
            for (int i = 0; i < QueueSize; i++)
            {
                Nodes[i] = new FastPriorityQueueNode();
                Queue.Enqueue(Nodes[i], RandomPriorities[i + QueueSize]);
            }
        }

        [Benchmark]
        public void Dequeue()
        {
            IterationSetup();
            for (int i = 0; i <= QueueSize / 2; i++)
            {
                Queue.Dequeue();
            }
        }

        [Benchmark]
        public void Enqueue()
        {
            IterationSetup();
            for (int i = 0; i <= QueueSize / 2; i++)
            {
                Queue.Enqueue(new FastPriorityQueueNode(), RandomPriorities[i]);
            }
        }

        [Benchmark]
        public bool Contains()
        {
            IterationSetup();
            bool ret = true;
            for (int i = 0; i < QueueSize; i++)
            {
                ret &= Queue.Contains(Nodes[i]);
            }
            return ret;
        }

        [Benchmark]
        public void Clear()
        {
            IterationSetup();
            Queue.Clear();
        }

        [Benchmark]
        public void UpdatePriority()
        {
            IterationSetup();
            for (int i = 0; i < QueueSize; i++)
            {
                Queue.UpdatePriority(Nodes[i], RandomPriorities[i]);
            }
        }
    }
}
BlueRaja commented 7 years ago

It turns out you can workaround the bug I reported using RunStrategy = Monitoring, but doing so causes the measurement-error to be too high, so I've stuck with measuring Enqueue + operations together.

In your code above, you should pass a seed to Random, and move the allocations to [GlobalSetup] so the tests aren't affected by allocator-jitter, but otherwise the solution I came up with is basically the same.

Running the new tests (which I've pushed), I see everything has sped up except for Enqueue, which is now slower. Are you able to reproduce this, or see any issue with the test?

dougmill commented 7 years ago

I'm doing thorough change-by-change benchmarking now, it'll take a while to get through everything.

dougmill commented 7 years ago

Took your benchmarks and added versions that use random priority numbers, plus a test for enumeration speed. You should probably add one for Remove too, but by the time I thought of that I really didn't want to go back to add it to the checks I'd already run.

I figured out what was slowing Enqueue down, it's fixed in Change 4 below. When CascadeUp is called with a node that's already where it should be, my refactor to avoid excess assignments in Swap ended up adding an operation of assigning the node to itself.

Benchmark results before any of my changes:

Method QueueSize Mean Error StdDev
Enqueue 1 23.70ns 0.0054ns 0.0048ns
EnqueueBackwards 1 23.49ns 0.0083ns 0.0069ns
EnqueueRandom 1 24.57ns 0.0395ns 0.0350ns
EnqueueDequeue 1 23.82ns 0.0041ns 0.0038ns
EnqueueBackwardsDequeue 1 24.15ns 0.0423ns 0.0395ns
EnqueueRandomDequeue 1 23.98ns 0.0183ns 0.0163ns
EnqueueUpdatePriority 1 31.43ns 0.0102ns 0.0096ns
EnqueueBackwardsUpdatePriority 1 31.78ns 0.0083ns 0.0073ns
EnqueueRandomUpdatePriority 1 34.45ns 0.0161ns 0.0134ns
EnqueueContains 1 29.11ns 0.0798ns 0.0667ns
Enqueue 100 699.55ns 0.0668ns 0.0592ns
EnqueueBackwards 100 3,375.54ns 16.4371ns 14.5710ns
EnqueueRandom 100 1,596.11ns 1.6911ns 1.4122ns
EnqueueDequeue 100 4,412.84ns 22.2034ns 20.7691ns
EnqueueBackwardsDequeue 100 6,969.57ns 10.8882ns 9.6521ns
EnqueueRandomDequeue 100 5,277.03ns 24.9260ns 23.3158ns
EnqueueUpdatePriority 100 4,012.80ns 9.5671ns 8.9490ns
EnqueueBackwardsUpdatePriority 100 6,710.78ns 15.7761ns 14.7570ns
EnqueueRandomUpdatePriority 100 2,903.03ns 8.9722ns 8.3926ns
EnqueueContains 100 847.97ns 1.6528ns 1.5460ns
Enqueue 10000 68,439.80ns 17.5800ns 14.6801ns
EnqueueBackwards 10000 751,837.87ns 294.1160ns 275.1162ns
EnqueueRandom 10000 261,665.52ns 109.0163ns 96.6401ns
EnqueueDequeue 10000 1,063,948.43ns 208.7239ns 195.2404ns
EnqueueBackwardsDequeue 10000 1,665,493.35ns 1,482.7118ns 1,314.3852ns
EnqueueRandomDequeue 10000 1,582,293.56ns 213.6763ns 189.4185ns
EnqueueUpdatePriority 10000 930,500.69ns 318.6172ns 282.4459ns
EnqueueBackwardsUpdatePriority 10000 1,590,628.87ns 504.8412ns 421.5652ns
EnqueueRandomUpdatePriority 10000 567,553.84ns 81.6522ns 63.7486ns
EnqueueContains 10000 84,466.75ns 19.4158ns 16.2131ns

Change 1: Replace an array index lookup with an existing local variable in Enqueue

Method QueueSize Mean Error StdDev
Enqueue 1 23.06ns 0.0049ns 0.0045ns
EnqueueBackwards 1 23.06ns 0.0058ns 0.0051ns
EnqueueRandom 1 23.46ns 0.0840ns 0.0785ns
EnqueueDequeue 1 23.97ns 0.0552ns 0.0516ns
EnqueueBackwardsDequeue 1 23.30ns 0.0035ns 0.0031ns
EnqueueRandomDequeue 1 23.58ns 0.0054ns 0.0050ns
EnqueueUpdatePriority 1 31.10ns 0.0327ns 0.0306ns
EnqueueBackwardsUpdatePriority 1 31.34ns 0.0477ns 0.0446ns
EnqueueRandomUpdatePriority 1 31.47ns 0.0270ns 0.0239ns
EnqueueContains 1 28.49ns 0.0449ns 0.0420ns
Enqueue 100 628.57ns 0.5571ns 0.5211ns
EnqueueBackwards 100 3,340.79ns 20.0611ns 17.7837ns
EnqueueRandom 100 1,479.08ns 7.2961ns 6.8247ns
EnqueueDequeue 100 4,377.86ns 7.9220ns 7.0226ns
EnqueueBackwardsDequeue 100 7,086.52ns 40.7370ns 38.1054ns
EnqueueRandomDequeue 100 4,844.13ns 19.5057ns 18.2456ns
EnqueueUpdatePriority 100 3,933.72ns 5.0809ns 4.7527ns
EnqueueBackwardsUpdatePriority 100 6,640.83ns 12.3533ns 11.5553ns
EnqueueRandomUpdatePriority 100 2,913.08ns 18.9302ns 17.7073ns
EnqueueContains 100 773.98ns 0.0968ns 0.0858ns
Enqueue 10000 61,133.03ns 6.2108ns 5.8096ns
EnqueueBackwards 10000 737,613.59ns 129.3863ns 101.0163ns
EnqueueRandom 10000 240,571.62ns 72.9698ns 64.6858ns
EnqueueDequeue 10000 1,063,322.46ns 480.8526ns 449.7898ns
EnqueueBackwardsDequeue 10000 1,662,042.66ns 1,086.9245ns 1,016.7098ns
EnqueueRandomDequeue 10000 1,559,694.85ns 386.9005ns 361.9069ns
EnqueueUpdatePriority 10000 933,441.43ns 218.7358ns 204.6056ns
EnqueueBackwardsUpdatePriority 10000 1,594,834.15ns 8,406.7363ns 7,863.6656ns
EnqueueRandomUpdatePriority 10000 547,323.52ns 488.0810ns 456.5513ns
EnqueueContains 10000 77,311.92ns 35.1052ns 27.4078ns

All Enqueue tests improved slightly, as expected. Oddly, backwards got double the benefit and random got triple, and only random showed it in the Enqueue+Dequeue benchmark. I'm guessing random variation for that.


Change 2: Replace division with bit shift in CascadeUp

Method QueueSize Mean Error StdDev
Enqueue 1 23.04ns 0.1229ns 0.1150ns
EnqueueBackwards 1 22.94ns 0.0045ns 0.0040ns
EnqueueRandom 1 23.55ns 0.1156ns 0.1081ns
EnqueueDequeue 1 23.20ns 0.0066ns 0.0058ns
EnqueueBackwardsDequeue 1 24.81ns 0.0041ns 0.0039ns
EnqueueRandomDequeue 1 23.43ns 0.1091ns 0.1021ns
EnqueueUpdatePriority 1 30.91ns 0.0419ns 0.0350ns
EnqueueBackwardsUpdatePriority 1 30.90ns 0.0314ns 0.0294ns
EnqueueRandomUpdatePriority 1 31.49ns 0.0118ns 0.0105ns
EnqueueContains 1 28.50ns 0.0031ns 0.0027ns
Enqueue 100 613.19ns 0.0935ns 0.0829ns
EnqueueBackwards 100 3,305.31ns 1.1133ns 1.0413ns
EnqueueRandom 100 1,427.34ns 6.2933ns 5.8868ns
EnqueueDequeue 100 4,327.81ns 32.8183ns 30.6982ns
EnqueueBackwardsDequeue 100 6,992.03ns 10.8474ns 9.0581ns
EnqueueRandomDequeue 100 4,826.75ns 10.3922ns 9.2124ns
EnqueueUpdatePriority 100 3,900.85ns 5.4591ns 5.1065ns
EnqueueBackwardsUpdatePriority 100 6,609.58ns 1.9522ns 1.4116ns
EnqueueRandomUpdatePriority 100 2,895.60ns 11.6646ns 10.9111ns
EnqueueContains 100 764.71ns 3.5469ns 3.3178ns
Enqueue 10000 60,159.54ns 9.4723ns 7.9098ns
EnqueueBackwards 10000 732,523.89ns 224.7203ns 199.2086ns
EnqueueRandom 10000 235,189.39ns 59.9786ns 53.1695ns
EnqueueDequeue 10000 1,058,321.45ns 198.0816ns 185.2857ns
EnqueueBackwardsDequeue 10000 1,659,102.72ns 1,704.4901ns 1,423.3263ns
EnqueueRandomDequeue 10000 1,555,021.86ns 397.7956ns 372.0982ns
EnqueueUpdatePriority 10000 921,594.29ns 5,164.6670ns 4,831.0323ns
EnqueueBackwardsUpdatePriority 10000 1,562,259.69ns 283.4421ns 251.2640ns
EnqueueRandomUpdatePriority 10000 540,237.70ns 245.8919ns 217.9768ns
EnqueueContains 10000 75,761.55ns 15.1815ns 13.4580ns

Enqueue barely budges, backwards and random improve slightly more. Dequeue shows no significant change, update also improves slightly. Pretty much what I expected.


Change 3: Factor out the Swap call in CascadeUp

Method QueueSize Mean Error StdDev
Enqueue 1 25.15ns 0.0086ns 0.0067ns
EnqueueBackwards 1 25.99ns 0.0110ns 0.0103ns
EnqueueRandom 1 25.53ns 0.0093ns 0.0082ns
EnqueueDequeue 1 25.15ns 0.0901ns 0.0843ns
EnqueueBackwardsDequeue 1 25.34ns 0.0588ns 0.0550ns
EnqueueRandomDequeue 1 25.53ns 0.0082ns 0.0073ns
EnqueueUpdatePriority 1 32.83ns 0.0267ns 0.0236ns
EnqueueBackwardsUpdatePriority 1 32.51ns 0.0140ns 0.0124ns
EnqueueRandomUpdatePriority 1 33.65ns 0.0509ns 0.0451ns
EnqueueContains 1 29.75ns 0.0173ns 0.0145ns
Enqueue 100 842.26ns 0.2187ns 0.2046ns
EnqueueBackwards 100 2,306.17ns 8.5186ns 7.9683ns
EnqueueRandom 100 1,356.36ns 19.5113ns 18.2508ns
EnqueueDequeue 100 4,590.50ns 9.8110ns 9.1772ns
EnqueueBackwardsDequeue 100 5,942.21ns 27.5989ns 25.8160ns
EnqueueRandomDequeue 100 4,721.65ns 7.6805ns 7.1843ns
EnqueueUpdatePriority 100 3,612.41ns 11.1550ns 10.4344ns
EnqueueBackwardsUpdatePriority 100 5,095.31ns 10.1868ns 9.5287ns
EnqueueRandomUpdatePriority 100 2,598.25ns 10.4238ns 9.7504ns
EnqueueContains 100 1,009.89ns 0.2935ns 0.2602ns
Enqueue 10000 82,203.35ns 13.2552ns 11.0687ns
EnqueueBackwards 10000 434,404.58ns 54.5780ns 51.0523ns
EnqueueRandom 10000 225,285.90ns 355.0185ns 296.4565ns
EnqueueDequeue 10000 1,098,426.81ns 1,193.9076ns 932.1244ns
EnqueueBackwardsDequeue 10000 1,345,915.35ns 440.7042ns 344.0728ns
EnqueueRandomDequeue 10000 1,542,973.77ns 341.4763ns 319.4172ns
EnqueueUpdatePriority 10000 790,467.03ns 93.0147ns 87.0060ns
EnqueueBackwardsUpdatePriority 10000 1,107,735.43ns 599.7814ns 561.0359ns
EnqueueRandomUpdatePriority 10000 514,600.82ns 385.4129ns 321.8372ns
EnqueueContains 10000 100,519.61ns 21.3884ns 17.8602ns

EnqueueRandom shows slight improvement, EnqueueBackwards makes a giant leap better, but Enqueue mysteriously gets slower. All Dequeue and Update variants are similarly affected, showing no effect or further improvements on top of the included Enqueue.


Change 4: I figured out what caused the decrease in Enqueue performance - in the no-op case of CascadeUp, which happens on every call of the base Enqueue benchmark, I had added an additional operation of assigning a variable to itself. To fix this, I "unrolled" the first iteration of the loop - if it reaches the break point on the very first node, it returns instead of break.

Method QueueSize Mean Error StdDev
Enqueue 1 25.19ns 0.1013ns 0.0947ns
EnqueueBackwards 1 25.29ns 0.0054ns 0.0045ns
EnqueueRandom 1 25.73ns 0.0853ns 0.0712ns
EnqueueDequeue 1 25.30ns 0.1186ns 0.1110ns
EnqueueBackwardsDequeue 1 26.44ns 0.0124ns 0.0116ns
EnqueueRandomDequeue 1 25.76ns 0.0168ns 0.0158ns
EnqueueUpdatePriority 1 32.67ns 0.0303ns 0.0283ns
EnqueueBackwardsUpdatePriority 1 34.29ns 0.1197ns 0.1120ns
EnqueueRandomUpdatePriority 1 35.03ns 0.0103ns 0.0086ns
EnqueueContains 1 29.69ns 0.0115ns 0.0096ns
Enqueue 100 649.55ns 0.2811ns 0.2630ns
EnqueueBackwards 100 2,359.95ns 1.5029ns 1.4058ns
EnqueueRandom 100 1,282.79ns 0.6518ns 0.5089ns
EnqueueDequeue 100 4,050.00ns 9.3506ns 8.7466ns
EnqueueBackwardsDequeue 100 5,602.77ns 19.3451ns 18.0954ns
EnqueueRandomDequeue 100 4,975.84ns 22.6907ns 21.2249ns
EnqueueUpdatePriority 100 3,346.71ns 1.7097ns 1.5156ns
EnqueueBackwardsUpdatePriority 100 5,058.36ns 7.1863ns 6.7221ns
EnqueueRandomUpdatePriority 100 2,403.64ns 1.3917ns 1.1621ns
EnqueueContains 100 798.10ns 0.2786ns 0.2606ns
Enqueue 10000 63,449.59ns 7.1874ns 6.0018ns
EnqueueBackwards 10000 454,265.13ns 804.9130ns 752.9160ns
EnqueueRandom 10000 218,237.78ns 1,292.7209ns 1,209.2118ns
EnqueueDequeue 10000 1,029,155.22ns 179.4745ns 159.0994ns
EnqueueBackwardsDequeue 10000 1,303,476.08ns 244.0623ns 203.8030ns
EnqueueRandomDequeue 10000 1,554,887.79ns 232.7699ns 217.7330ns
EnqueueUpdatePriority 10000 770,142.02ns 218.9391ns 182.8241ns
EnqueueBackwardsUpdatePriority 10000 1,121,204.62ns 446.0621ns 417.2467ns
EnqueueRandomUpdatePriority 10000 496,390.39ns 2,082.4300ns 1,846.0197ns
EnqueueContains 10000 79,149.01ns 8.5873ns 8.0326ns

As expected, this drops Enqueue back down, though there's still about a 3 microsecond difference I can't explain. My guess is something related to code size, as far as I can tell the sequence of operations for the no-op case is now identical to before.


Change 5: Moved the first bit shift and added an early return, to optimize the case of calling CascadeUp on the head node. I later realized this only ever happens when enqueueing to an empty queue, but I don't think it hurt anything and should have a (tiny) improvement.

Method QueueSize Mean Error StdDev
Enqueue 1 24.14ns 0.0048ns 0.0045ns
EnqueueBackwards 1 23.50ns 0.0055ns 0.0049ns
EnqueueRandom 1 24.63ns 0.0145ns 0.0113ns
EnqueueDequeue 1 23.47ns 0.0092ns 0.0072ns
EnqueueBackwardsDequeue 1 23.77ns 0.0041ns 0.0039ns
EnqueueRandomDequeue 1 23.57ns 0.0036ns 0.0030ns
EnqueueUpdatePriority 1 31.76ns 0.0193ns 0.0171ns
EnqueueBackwardsUpdatePriority 1 32.21ns 0.0198ns 0.0165ns
EnqueueRandomUpdatePriority 1 33.24ns 0.0063ns 0.0049ns
EnqueueContains 1 28.03ns 0.0207ns 0.0173ns
Enqueue 100 651.85ns 0.3444ns 0.3222ns
EnqueueBackwards 100 2,358.57ns 9.8739ns 9.2361ns
EnqueueRandom 100 1,315.91ns 0.6649ns 0.5894ns
EnqueueDequeue 100 4,401.37ns 19.6102ns 17.3839ns
EnqueueBackwardsDequeue 100 5,826.22ns 4.1035ns 3.6377ns
EnqueueRandomDequeue 100 4,829.20ns 66.2628ns 61.9822ns
EnqueueUpdatePriority 100 3,452.89ns 8.9250ns 8.3485ns
EnqueueBackwardsUpdatePriority 100 5,142.04ns 8.7607ns 8.1948ns
EnqueueRandomUpdatePriority 100 2,565.42ns 2.3777ns 2.2241ns
EnqueueContains 100 801.17ns 0.1006ns 0.0941ns
Enqueue 10000 63,719.13ns 10.0423ns 9.3936ns
EnqueueBackwards 10000 440,185.20ns 57.2720ns 47.8247ns
EnqueueRandom 10000 216,733.89ns 201.5636ns 188.5427ns
EnqueueDequeue 10000 1,077,364.05ns 568.6074ns 443.9312ns
EnqueueBackwardsDequeue 10000 1,357,187.92ns 268.4208ns 251.0810ns
EnqueueRandomDequeue 10000 1,537,493.54ns 463.7648ns 433.8058ns
EnqueueUpdatePriority 10000 771,984.83ns 71.5662ns 66.9430ns
EnqueueBackwardsUpdatePriority 10000 1,097,932.13ns 834.3672ns 739.6447ns
EnqueueRandomUpdatePriority 10000 508,312.52ns 99.5266ns 88.2277ns
EnqueueContains 10000 79,903.73ns 24.1708ns 22.6094ns

Some benchmarks are worse, some better, with a weak trend to worse. I tested reverting this specific change later and saw no improvement for the revert, so I'm inclined to chalk this up to randomness.


Change 6: Moved the bit shift in CascadeUp so that it will be skipped on the final iteration, and also made it stop when the swap would be of equal priority nodes.

Method QueueSize Mean Error StdDev
Enqueue 1 23.83ns 0.0038ns 0.0034ns
EnqueueBackwards 1 24.26ns 0.0042ns 0.0038ns
EnqueueRandom 1 24.04ns 0.0073ns 0.0057ns
EnqueueDequeue 1 23.65ns 0.0051ns 0.0047ns
EnqueueBackwardsDequeue 1 23.82ns 0.0125ns 0.0104ns
EnqueueRandomDequeue 1 23.64ns 0.0042ns 0.0038ns
EnqueueUpdatePriority 1 32.02ns 0.0516ns 0.0482ns
EnqueueBackwardsUpdatePriority 1 31.42ns 0.0789ns 0.0738ns
EnqueueRandomUpdatePriority 1 32.03ns 0.0081ns 0.0075ns
EnqueueContains 1 28.32ns 0.0129ns 0.0114ns
Enqueue 100 651.09ns 0.1539ns 0.1285ns
EnqueueBackwards 100 2,474.32ns 1.8081ns 1.6913ns
EnqueueRandom 100 1,326.77ns 1.4647ns 1.3700ns
EnqueueDequeue 100 4,422.01ns 12.5052ns 11.0855ns
EnqueueBackwardsDequeue 100 5,979.95ns 7.1830ns 6.3676ns
EnqueueRandomDequeue 100 4,811.23ns 26.2295ns 24.5351ns
EnqueueUpdatePriority 100 3,460.94ns 17.5167ns 16.3851ns
EnqueueBackwardsUpdatePriority 100 5,294.51ns 8.2645ns 6.9012ns
EnqueueRandomUpdatePriority 100 2,689.72ns 5.5826ns 4.0366ns
EnqueueContains 100 799.59ns 0.0954ns 0.0845ns
Enqueue 10000 64,555.38ns 15.4673ns 12.9159ns
EnqueueBackwards 10000 466,651.35ns 131.1901ns 122.7153ns
EnqueueRandom 10000 217,306.92ns 548.7084ns 513.2621ns
EnqueueDequeue 10000 1,072,899.93ns 101.6111ns 90.0756ns
EnqueueBackwardsDequeue 10000 1,384,956.41ns 168.1195ns 157.2591ns
EnqueueRandomDequeue 10000 1,534,886.29ns 359.0919ns 318.3256ns
EnqueueUpdatePriority 10000 781,804.03ns 93.7744ns 73.2128ns
EnqueueBackwardsUpdatePriority 10000 1,144,182.14ns 432.8527ns 337.9429ns
EnqueueRandomUpdatePriority 10000 518,266.60ns 73.4251ns 65.0895ns
EnqueueContains 10000 79,828.49ns 13.9809ns 11.6747ns

I was expecting a small but noticeable improvement for this one, but it seems I got mostly worse times instead. Again, I tried reverting this later and saw no improvement, so I'm guessing randomness.


Change 7: Check the node index in CascadeUp before the first assignment to a local variable.

Method QueueSize Mean Error StdDev
Enqueue 1 23.52ns 0.0040ns 0.0036ns
EnqueueBackwards 1 23.82ns 0.0114ns 0.0107ns
EnqueueRandom 1 24.43ns 0.0062ns 0.0058ns
EnqueueDequeue 1 23.49ns 0.0057ns 0.0053ns
EnqueueBackwardsDequeue 1 23.87ns 0.0074ns 0.0066ns
EnqueueRandomDequeue 1 23.57ns 0.0051ns 0.0045ns
EnqueueUpdatePriority 1 31.55ns 0.0483ns 0.0377ns
EnqueueBackwardsUpdatePriority 1 31.41ns 0.0066ns 0.0062ns
EnqueueRandomUpdatePriority 1 31.88ns 0.1081ns 0.1012ns
EnqueueContains 1 28.03ns 0.0044ns 0.0041ns
Enqueue 100 650.95ns 0.1262ns 0.1181ns
EnqueueBackwards 100 2,467.41ns 1.1693ns 1.0937ns
EnqueueRandom 100 1,312.95ns 1.9043ns 1.3770ns
EnqueueDequeue 100 4,379.01ns 15.5859ns 14.5790ns
EnqueueBackwardsDequeue 100 5,937.19ns 20.3389ns 19.0250ns
EnqueueRandomDequeue 100 4,853.26ns 21.4105ns 18.9798ns
EnqueueUpdatePriority 100 3,452.98ns 5.5251ns 5.1682ns
EnqueueBackwardsUpdatePriority 100 5,265.21ns 3.1482ns 2.7908ns
EnqueueRandomUpdatePriority 100 2,706.22ns 16.4183ns 15.3577ns
EnqueueContains 100 798.76ns 0.1572ns 0.1394ns
Enqueue 10000 63,815.43ns 12.9826ns 12.1439ns
EnqueueBackwards 10000 467,740.56ns 176.5814ns 137.8631ns
EnqueueRandom 10000 216,327.37ns 212.2575ns 198.5457ns
EnqueueDequeue 10000 1,072,142.36ns 219.3942ns 194.4872ns
EnqueueBackwardsDequeue 10000 1,383,692.69ns 298.5832ns 264.6862ns
EnqueueRandomDequeue 10000 1,540,627.22ns 298.6285ns 279.3373ns
EnqueueUpdatePriority 10000 781,679.48ns 74.5785ns 66.1119ns
EnqueueBackwardsUpdatePriority 10000 1,136,366.00ns 365.4693ns 323.9790ns
EnqueueRandomUpdatePriority 10000 520,205.79ns 236.1968ns 197.2350ns
EnqueueContains 10000 79,563.30ns 13.2946ns 11.7853ns

No meaningful change in anything. I think it should theoretically be better, or at least not worse, so I'm keeping this change.


Change 8: Add the AggressiveInlining attribute to CascadeUp.

Method QueueSize Mean Error StdDev
Enqueue 1 22.24ns 0.0077ns 0.0068ns
EnqueueBackwards 1 22.02ns 0.0045ns 0.0038ns
EnqueueRandom 1 22.31ns 0.0043ns 0.0034ns
EnqueueDequeue 1 22.69ns 0.0033ns 0.0030ns
EnqueueBackwardsDequeue 1 22.16ns 0.0050ns 0.0042ns
EnqueueRandomDequeue 1 23.41ns 0.0981ns 0.0917ns
EnqueueUpdatePriority 1 30.24ns 0.0487ns 0.0432ns
EnqueueBackwardsUpdatePriority 1 30.14ns 0.0109ns 0.0091ns
EnqueueRandomUpdatePriority 1 30.33ns 0.0329ns 0.0275ns
EnqueueContains 1 28.48ns 0.1631ns 0.1526ns
Enqueue 100 453.02ns 0.0961ns 0.0852ns
EnqueueBackwards 100 2,122.68ns 0.4891ns 0.4336ns
EnqueueRandom 100 1,064.51ns 1.5952ns 1.2454ns
EnqueueDequeue 100 4,041.42ns 10.6785ns 9.9887ns
EnqueueBackwardsDequeue 100 5,446.79ns 12.1101ns 11.3278ns
EnqueueRandomDequeue 100 4,599.58ns 9.2748ns 8.6757ns
EnqueueUpdatePriority 100 3,074.87ns 9.6215ns 8.9999ns
EnqueueBackwardsUpdatePriority 100 4,744.44ns 15.7375ns 14.7209ns
EnqueueRandomUpdatePriority 100 2,198.05ns 3.5201ns 2.7483ns
EnqueueContains 100 621.58ns 0.1698ns 0.1505ns
Enqueue 10000 43,682.09ns 4.1872ns 3.7118ns
EnqueueBackwards 10000 416,682.10ns 99.4232ns 93.0005ns
EnqueueRandom 10000 190,063.52ns 80.5496ns 71.4051ns
EnqueueDequeue 10000 1,032,158.26ns 369.6891ns 244.5265ns
EnqueueBackwardsDequeue 10000 1,322,934.02ns 1,218.6786ns 1,017.6517ns
EnqueueRandomDequeue 10000 1,531,388.11ns 1,827.9342ns 1,709.8506ns
EnqueueUpdatePriority 10000 740,885.87ns 217.5127ns 203.4615ns
EnqueueBackwardsUpdatePriority 10000 1,067,612.13ns 139.7863ns 116.7279ns
EnqueueRandomUpdatePriority 10000 466,245.07ns 65.0019ns 60.8028ns
EnqueueContains 10000 61,976.79ns 43.2958ns 40.4989ns

All Enqueue times substantially improved, and other benchmarks added their own improvements on top. Whatever caused inlining here to decrease performance before is evidently gone now.


Change 9: Tried caching the node index in a local variable in CascadeUp.

Method QueueSize Mean Error StdDev
Enqueue 1 22.74ns 0.0202ns 0.0169ns
EnqueueBackwards 1 22.95ns 0.0275ns 0.0258ns
EnqueueRandom 1 23.27ns 0.1024ns 0.0908ns
EnqueueDequeue 1 22.98ns 0.0158ns 0.0147ns
EnqueueBackwardsDequeue 1 22.80ns 0.0021ns 0.0015ns
EnqueueRandomDequeue 1 23.37ns 0.0728ns 0.0681ns
EnqueueUpdatePriority 1 30.01ns 0.0083ns 0.0078ns
EnqueueBackwardsUpdatePriority 1 30.37ns 0.0251ns 0.0235ns
EnqueueRandomUpdatePriority 1 30.88ns 0.0328ns 0.0274ns
EnqueueContains 1 27.29ns 0.0216ns 0.0180ns
Enqueue 100 474.31ns 1.1535ns 1.0790ns
EnqueueBackwards 100 2,084.20ns 1.9779ns 1.8501ns
EnqueueRandom 100 1,051.70ns 1.7402ns 1.6278ns
EnqueueDequeue 100 4,015.11ns 6.2019ns 5.4979ns
EnqueueBackwardsDequeue 100 5,469.15ns 17.9352ns 15.8991ns
EnqueueRandomDequeue 100 4,622.77ns 26.4007ns 23.4035ns
EnqueueUpdatePriority 100 3,057.60ns 3.2832ns 2.5633ns
EnqueueBackwardsUpdatePriority 100 4,710.85ns 9.3391ns 8.2789ns
EnqueueRandomUpdatePriority 100 2,248.42ns 3.9133ns 3.2678ns
EnqueueContains 100 644.04ns 2.0702ns 1.8351ns
Enqueue 10000 45,596.08ns 77.7257ns 72.7046ns
EnqueueBackwards 10000 412,514.14ns 134.1962ns 112.0599ns
EnqueueRandom 10000 189,944.90ns 54.4808ns 45.4939ns
EnqueueDequeue 10000 1,038,879.45ns 113.5238ns 88.6319ns
EnqueueBackwardsDequeue 10000 1,316,721.29ns 1,886.6193ns 1,764.7447ns
EnqueueRandomDequeue 10000 1,537,147.12ns 1,202.0277ns 1,003.7475ns
EnqueueUpdatePriority 10000 735,055.07ns 87.5767ns 77.6345ns
EnqueueBackwardsUpdatePriority 10000 1,066,156.29ns 574.5495ns 537.4339ns
EnqueueRandomUpdatePriority 10000 470,931.10ns 171.0114ns 151.5972ns
EnqueueContains 10000 63,646.34ns 62.7247ns 58.6727ns

No definite effect, and I think it slightly complicates the code. I reverted this change immediately and never tried local variable caching again.


Change 10: Copied in the rewrite of CascadeDown

Method QueueSize Mean Error StdDev
Enqueue 1 22.20ns 0.0071ns 0.0066ns
EnqueueBackwards 1 22.29ns 0.0057ns 0.0053ns
EnqueueRandom 1 22.34ns 0.0083ns 0.0070ns
EnqueueDequeue 1 22.56ns 0.0027ns 0.0025ns
EnqueueBackwardsDequeue 1 22.09ns 0.0048ns 0.0045ns
EnqueueRandomDequeue 1 22.58ns 0.0058ns 0.0054ns
EnqueueUpdatePriority 1 29.73ns 0.0357ns 0.0334ns
EnqueueBackwardsUpdatePriority 1 29.56ns 0.0457ns 0.0427ns
EnqueueRandomUpdatePriority 1 30.26ns 0.0043ns 0.0036ns
EnqueueContains 1 26.93ns 0.0066ns 0.0058ns
Enqueue 100 453.04ns 0.1152ns 0.1077ns
EnqueueBackwards 100 2,138.29ns 2.7901ns 2.0174ns
EnqueueRandom 100 1,067.74ns 1.3937ns 1.2355ns
EnqueueDequeue 100 4,150.14ns 24.2443ns 22.6781ns
EnqueueBackwardsDequeue 100 5,540.79ns 8.4947ns 7.9459ns
EnqueueRandomDequeue 100 4,952.87ns 30.5417ns 25.5037ns
EnqueueUpdatePriority 100 3,112.38ns 2.0788ns 1.9445ns
EnqueueBackwardsUpdatePriority 100 4,847.79ns 4.9923ns 4.4255ns
EnqueueRandomUpdatePriority 100 2,247.65ns 1.7452ns 1.6325ns
EnqueueContains 100 621.47ns 0.1432ns 0.1339ns
Enqueue 10000 43,712.94ns 4.5534ns 3.8023ns
EnqueueBackwards 10000 416,561.57ns 60.7293ns 56.8063ns
EnqueueRandom 10000 190,124.79ns 80.3340ns 75.1445ns
EnqueueDequeue 10000 1,044,064.23ns 232.7742ns 217.7371ns
EnqueueBackwardsDequeue 10000 1,328,398.46ns 556.0566ns 464.3324ns
EnqueueRandomDequeue 10000 1,505,869.37ns 364.7999ns 341.2341ns
EnqueueUpdatePriority 10000 734,316.95ns 88.3724ns 82.6636ns
EnqueueBackwardsUpdatePriority 10000 1,087,657.92ns 310.4105ns 275.1708ns
EnqueueRandomUpdatePriority 10000 470,536.66ns 87.7242ns 77.7652ns
EnqueueContains 10000 61,959.12ns 2.8381ns 2.3699ns

In my original testing this is where the biggest improvements came from. Here, it seems to have no effect. I have since learned that my original tests had the debugger running - I hadn't known that choosing Release mode wasn't enough to prevent that - and presumably that affected it somehow. Oh well, this rewrite doesn't hurt.


Change 11: As unrolling the first iteration in CascadeUp prevented some no-ops, I figured doing the same in CascadeDown should help too.

Method QueueSize Mean Error StdDev
Enqueue 1 22.21ns 0.0127ns 0.0106ns
EnqueueBackwards 1 22.74ns 0.0200ns 0.0187ns
EnqueueRandom 1 22.60ns 0.0092ns 0.0081ns
EnqueueDequeue 1 22.73ns 0.0850ns 0.0795ns
EnqueueBackwardsDequeue 1 22.50ns 0.0048ns 0.0040ns
EnqueueRandomDequeue 1 23.36ns 0.0050ns 0.0042ns
EnqueueUpdatePriority 1 27.88ns 0.0074ns 0.0058ns
EnqueueBackwardsUpdatePriority 1 29.51ns 0.0231ns 0.0216ns
EnqueueRandomUpdatePriority 1 28.34ns 0.0213ns 0.0189ns
EnqueueContains 1 26.72ns 0.0065ns 0.0058ns
Enqueue 100 453.45ns 1.3017ns 1.2176ns
EnqueueBackwards 100 2,134.51ns 8.4962ns 7.5317ns
EnqueueRandom 100 1,052.76ns 1.4396ns 1.2762ns
EnqueueDequeue 100 4,245.47ns 39.5863ns 37.0290ns
EnqueueBackwardsDequeue 100 5,695.61ns 31.1627ns 29.1496ns
EnqueueRandomDequeue 100 4,775.98ns 18.3707ns 16.2852ns
EnqueueUpdatePriority 100 3,154.44ns 7.7462ns 6.8668ns
EnqueueBackwardsUpdatePriority 100 4,896.65ns 6.6384ns 6.2096ns
EnqueueRandomUpdatePriority 100 2,171.64ns 3.2514ns 2.7151ns
EnqueueContains 100 623.18ns 0.3062ns 0.2864ns
Enqueue 10000 43,641.93ns 7.7952ns 6.9102ns
EnqueueBackwards 10000 416,571.06ns 89.7573ns 83.9590ns
EnqueueRandom 10000 190,372.01ns 148.7788ns 139.1678ns
EnqueueDequeue 10000 1,056,882.89ns 206.9015ns 172.7721ns
EnqueueBackwardsDequeue 10000 1,338,313.66ns 3,425.7467ns 3,204.4453ns
EnqueueRandomDequeue 10000 1,400,901.57ns 761.5125ns 712.3192ns
EnqueueUpdatePriority 10000 732,245.29ns 1,280.4280ns 1,197.7130ns
EnqueueBackwardsUpdatePriority 10000 1,099,620.03ns 1,381.6871ns 1,292.4308ns
EnqueueRandomUpdatePriority 10000 465,333.08ns 135.2460ns 119.8921ns
EnqueueContains 10000 61,698.93ns 6.9668ns 6.1759ns

Most benchmarks show no significant change, but EnqueueRandomDequeue specifically - just that one, not the other two dequeue benchmarks - chopped 100 microseconds off its 1500 total. I don't know why this change would affect that specific case only, but I repeated the benchmark run a few times to be sure and it's consistent.


Change 12: Added inline attribute to Dequeue

Method QueueSize Mean Error StdDev
Enqueue 1 22.19ns 0.0511ns 0.0478ns
EnqueueBackwards 1 22.58ns 0.0804ns 0.0752ns
EnqueueRandom 1 22.34ns 0.0017ns 0.0014ns
EnqueueDequeue 1 22.83ns 0.0045ns 0.0042ns
EnqueueBackwardsDequeue 1 22.69ns 0.0039ns 0.0032ns
EnqueueRandomDequeue 1 22.61ns 0.0204ns 0.0181ns
EnqueueUpdatePriority 1 27.56ns 0.0194ns 0.0172ns
EnqueueBackwardsUpdatePriority 1 27.61ns 0.0601ns 0.0562ns
EnqueueRandomUpdatePriority 1 28.84ns 0.0279ns 0.0247ns
EnqueueContains 1 27.02ns 0.1429ns 0.1336ns
Enqueue 100 454.76ns 0.0709ns 0.0592ns
EnqueueBackwards 100 2,122.60ns 0.3536ns 0.3135ns
EnqueueRandom 100 1,056.99ns 0.9723ns 0.9095ns
EnqueueDequeue 100 4,258.42ns 5.1065ns 4.2642ns
EnqueueBackwardsDequeue 100 5,698.16ns 21.6832ns 20.2825ns
EnqueueRandomDequeue 100 4,786.78ns 31.3661ns 29.3398ns
EnqueueUpdatePriority 100 3,197.65ns 4.8257ns 4.0297ns
EnqueueBackwardsUpdatePriority 100 4,940.45ns 6.5178ns 6.0967ns
EnqueueRandomUpdatePriority 100 2,158.86ns 1.3417ns 1.1894ns
EnqueueContains 100 621.63ns 0.1419ns 0.1258ns
Enqueue 10000 44,076.62ns 211.6200ns 197.9495ns
EnqueueBackwards 10000 416,918.17ns 934.0414ns 779.9668ns
EnqueueRandom 10000 189,972.81ns 122.5352ns 114.6195ns
EnqueueDequeue 10000 1,052,608.52ns 306.3106ns 286.5231ns
EnqueueBackwardsDequeue 10000 1,333,331.83ns 268.2575ns 250.9282ns
EnqueueRandomDequeue 10000 1,397,409.25ns 534.1147ns 499.6111ns
EnqueueUpdatePriority 10000 737,059.19ns 699.8656ns 506.0487ns
EnqueueBackwardsUpdatePriority 10000 1,096,362.36ns 1,414.3376ns 1,322.9721ns
EnqueueRandomUpdatePriority 10000 467,614.67ns 298.2784ns 264.4160ns
EnqueueContains 10000 61,756.56ns 4.6587ns 4.3578ns

No meaningful effect on anything. I'm guessing this is because Dequeue was small enough to be inlined anyway. I reverted at this point, but revisited the idea later.


Change 13: Factor out the use of Swap in Remove

Method QueueSize Mean Error StdDev
Enqueue 1 22.41ns 0.0039ns 0.0030ns
EnqueueBackwards 1 22.38ns 0.0075ns 0.0067ns
EnqueueRandom 1 23.33ns 0.0209ns 0.0175ns
EnqueueDequeue 1 22.77ns 0.0026ns 0.0023ns
EnqueueBackwardsDequeue 1 23.01ns 0.0124ns 0.0116ns
EnqueueRandomDequeue 1 22.99ns 0.1034ns 0.0916ns
EnqueueUpdatePriority 1 28.33ns 0.0090ns 0.0080ns
EnqueueBackwardsUpdatePriority 1 27.68ns 0.0383ns 0.0339ns
EnqueueRandomUpdatePriority 1 28.38ns 0.0094ns 0.0088ns
EnqueueContains 1 26.55ns 0.0119ns 0.0112ns
Enqueue 100 452.81ns 0.0986ns 0.0874ns
EnqueueBackwards 100 2,122.19ns 0.2572ns 0.2280ns
EnqueueRandom 100 1,063.52ns 5.2930ns 4.9510ns
EnqueueDequeue 100 3,869.74ns 22.1228ns 20.6937ns
EnqueueBackwardsDequeue 100 5,246.61ns 10.3509ns 9.6823ns
EnqueueRandomDequeue 100 4,529.24ns 23.6071ns 22.0821ns
EnqueueUpdatePriority 100 3,159.93ns 4.5902ns 4.2937ns
EnqueueBackwardsUpdatePriority 100 4,916.10ns 19.3294ns 18.0807ns
EnqueueRandomUpdatePriority 100 2,163.47ns 1.6592ns 1.4709ns
EnqueueContains 100 642.07ns 0.4308ns 0.3363ns
Enqueue 10000 43,712.68ns 4.9400ns 3.5719ns
EnqueueBackwards 10000 416,549.02ns 99.3137ns 88.0390ns
EnqueueRandom 10000 190,394.21ns 61.8104ns 48.2575ns
EnqueueDequeue 10000 972,646.87ns 176.5460ns 165.1412ns
EnqueueBackwardsDequeue 10000 1,272,651.01ns 5,125.3539ns 4,794.2588ns
EnqueueRandomDequeue 10000 1,444,087.75ns 249.3100ns 233.2047ns
EnqueueUpdatePriority 10000 731,074.45ns 967.0588ns 857.2724ns
EnqueueBackwardsUpdatePriority 10000 1,099,364.99ns 4,494.7224ns 4,204.3657ns
EnqueueRandomUpdatePriority 10000 465,234.02ns 593.4637ns 555.1262ns
EnqueueContains 10000 61,614.84ns 19.3570ns 16.1640ns

Two of the Dequeue cases improved a fair bit, the other got slower about the same amount.


Change 14: Added inline attribute to Remove

Method QueueSize Mean Error StdDev
Enqueue 1 22.20ns 0.0064ns 0.0059ns
EnqueueBackwards 1 22.27ns 0.0425ns 0.0397ns
EnqueueRandom 1 22.31ns 0.0167ns 0.0148ns
EnqueueDequeue 1 23.86ns 0.1307ns 0.1222ns
EnqueueBackwardsDequeue 1 23.42ns 0.0123ns 0.0116ns
EnqueueRandomDequeue 1 23.50ns 0.0278ns 0.0260ns
EnqueueUpdatePriority 1 28.35ns 0.0953ns 0.0744ns
EnqueueBackwardsUpdatePriority 1 27.86ns 0.1183ns 0.1107ns
EnqueueRandomUpdatePriority 1 28.31ns 0.0570ns 0.0505ns
EnqueueContains 1 26.43ns 0.0115ns 0.0102ns
Enqueue 100 452.71ns 0.2499ns 0.2087ns
EnqueueBackwards 100 2,133.58ns 7.2715ns 6.8018ns
EnqueueRandom 100 1,076.99ns 12.0644ns 10.6948ns
EnqueueDequeue 100 3,597.61ns 7.2484ns 6.0527ns
EnqueueBackwardsDequeue 100 5,075.53ns 9.8576ns 9.2208ns
EnqueueRandomDequeue 100 4,539.90ns 18.0326ns 16.8677ns
EnqueueUpdatePriority 100 3,165.54ns 5.7062ns 5.3376ns
EnqueueBackwardsUpdatePriority 100 4,892.66ns 3.6112ns 3.2013ns
EnqueueRandomUpdatePriority 100 2,150.24ns 1.4414ns 1.3483ns
EnqueueContains 100 638.57ns 0.7877ns 0.7368ns
Enqueue 10000 43,705.85ns 8.0526ns 6.7242ns
EnqueueBackwards 10000 416,548.27ns 98.8718ns 92.4847ns
EnqueueRandom 10000 190,127.46ns 138.7274ns 122.9783ns
EnqueueDequeue 10000 951,685.61ns 478.2353ns 447.3416ns
EnqueueBackwardsDequeue 10000 1,232,614.45ns 1,005.2223ns 940.2854ns
EnqueueRandomDequeue 10000 1,428,960.32ns 243.0029ns 202.9184ns
EnqueueUpdatePriority 10000 736,639.75ns 840.5913ns 745.1622ns
EnqueueBackwardsUpdatePriority 10000 1,099,487.39ns 773.1950ns 603.6598ns
EnqueueRandomUpdatePriority 10000 465,429.96ns 328.5549ns 307.3304ns
EnqueueContains 10000 61,765.65ns 11.0914ns 9.8323ns

All Dequeue cases improved just a little bit.


Change 15: Copy implementation of Remove into Dequeue and edit for the special case of the removed node always being the head

Method QueueSize Mean Error StdDev
Enqueue 1 22.58ns 0.0145ns 0.0121ns
EnqueueBackwards 1 22.64ns 0.0070ns 0.0058ns
EnqueueRandom 1 22.34ns 0.0037ns 0.0033ns
EnqueueDequeue 1 23.19ns 0.0067ns 0.0063ns
EnqueueBackwardsDequeue 1 23.40ns 0.0302ns 0.0283ns
EnqueueRandomDequeue 1 23.43ns 0.0311ns 0.0291ns
EnqueueUpdatePriority 1 27.64ns 0.0079ns 0.0070ns
EnqueueBackwardsUpdatePriority 1 27.40ns 0.0056ns 0.0053ns
EnqueueRandomUpdatePriority 1 28.92ns 0.0147ns 0.0137ns
EnqueueContains 1 26.80ns 0.0081ns 0.0071ns
Enqueue 100 452.84ns 0.1059ns 0.0884ns
EnqueueBackwards 100 2,122.45ns 0.4275ns 0.3790ns
EnqueueRandom 100 1,067.15ns 0.6758ns 0.5644ns
EnqueueDequeue 100 3,768.56ns 15.8371ns 14.8141ns
EnqueueBackwardsDequeue 100 5,158.14ns 8.1800ns 6.3864ns
EnqueueRandomDequeue 100 4,100.04ns 29.4456ns 27.5434ns
EnqueueUpdatePriority 100 3,164.30ns 7.6815ns 7.1852ns
EnqueueBackwardsUpdatePriority 100 4,901.39ns 1.9473ns 1.7262ns
EnqueueRandomUpdatePriority 100 2,185.92ns 7.0964ns 6.2907ns
EnqueueContains 100 627.28ns 0.1283ns 0.1071ns
Enqueue 10000 43,796.43ns 98.0872ns 91.7508ns
EnqueueBackwards 10000 416,528.04ns 107.6042ns 100.6530ns
EnqueueRandom 10000 190,775.81ns 801.9965ns 750.1879ns
EnqueueDequeue 10000 1,015,905.68ns 258.4723ns 215.8361ns
EnqueueBackwardsDequeue 10000 1,306,963.43ns 2,702.7546ns 2,528.1581ns
EnqueueRandomDequeue 10000 1,404,379.56ns 824.1601ns 770.9198ns
EnqueueUpdatePriority 10000 734,222.67ns 763.8830ns 677.1623ns
EnqueueBackwardsUpdatePriority 10000 1,094,132.17ns 262.4373ns 245.4840ns
EnqueueRandomUpdatePriority 10000 467,447.01ns 2,503.1644ns 2,341.4613ns
EnqueueContains 10000 61,729.44ns 10.0395ns 8.8998ns

Two Dequeue cases got worse, one got better. Next few changes investigate a bit more.


Change 16: Add AggressiveInlining to Dequeue again

Method QueueSize Mean Error StdDev
Enqueue 1 22.20ns 0.0754ns 0.0705ns
EnqueueBackwards 1 22.21ns 0.0891ns 0.0833ns
EnqueueRandom 1 22.72ns 0.0065ns 0.0061ns
EnqueueDequeue 1 22.70ns 0.0079ns 0.0066ns
EnqueueBackwardsDequeue 1 22.41ns 0.0059ns 0.0052ns
EnqueueRandomDequeue 1 23.49ns 0.0358ns 0.0335ns
EnqueueUpdatePriority 1 28.12ns 0.0078ns 0.0073ns
EnqueueBackwardsUpdatePriority 1 27.42ns 0.0078ns 0.0065ns
EnqueueRandomUpdatePriority 1 29.04ns 0.0238ns 0.0211ns
EnqueueContains 1 26.94ns 0.0086ns 0.0071ns
Enqueue 100 455.12ns 0.0539ns 0.0390ns
EnqueueBackwards 100 2,124.73ns 0.3888ns 0.3447ns
EnqueueRandom 100 1,065.69ns 0.5999ns 0.5318ns
EnqueueDequeue 100 3,394.18ns 6.1180ns 5.7228ns
EnqueueBackwardsDequeue 100 4,860.40ns 16.5571ns 15.4875ns
EnqueueRandomDequeue 100 4,015.70ns 46.3825ns 43.3862ns
EnqueueUpdatePriority 100 3,160.21ns 7.9729ns 7.4578ns
EnqueueBackwardsUpdatePriority 100 4,907.65ns 18.7129ns 17.5041ns
EnqueueRandomUpdatePriority 100 2,160.94ns 3.4017ns 3.0155ns
EnqueueContains 100 624.85ns 0.1165ns 0.1089ns
Enqueue 10000 44,093.14ns 228.7990ns 214.0187ns
EnqueueBackwards 10000 416,573.89ns 223.5962ns 186.7130ns
EnqueueRandom 10000 190,456.26ns 278.5402ns 246.9186ns
EnqueueDequeue 10000 943,300.79ns 158.0329ns 140.0920ns
EnqueueBackwardsDequeue 10000 1,233,626.37ns 83.6337ns 74.1391ns
EnqueueRandomDequeue 10000 1,362,721.59ns 622.2356ns 519.5948ns
EnqueueUpdatePriority 10000 733,974.11ns 603.8905ns 564.8795ns
EnqueueBackwardsUpdatePriority 10000 1,101,374.95ns 1,922.2804ns 1,605.1911ns
EnqueueRandomUpdatePriority 10000 465,139.68ns 601.7778ns 502.5117ns
EnqueueContains 10000 61,766.98ns 9.3360ns 7.7960ns

All Dequeue cases improved significantly, all three to a point better than at Change 14.


Change 17: Tried reverting Change 15

Method QueueSize Mean Error StdDev
Enqueue 1 22.55ns 0.0073ns 0.0065ns
EnqueueBackwards 1 22.25ns 0.0053ns 0.0047ns
EnqueueRandom 1 22.34ns 0.0042ns 0.0038ns
EnqueueDequeue 1 23.81ns 0.0598ns 0.0560ns
EnqueueBackwardsDequeue 1 23.43ns 0.0137ns 0.0114ns
EnqueueRandomDequeue 1 23.75ns 0.0982ns 0.0918ns
EnqueueUpdatePriority 1 27.91ns 0.0053ns 0.0045ns
EnqueueBackwardsUpdatePriority 1 27.53ns 0.0096ns 0.0090ns
EnqueueRandomUpdatePriority 1 28.20ns 0.0099ns 0.0093ns
EnqueueContains 1 27.04ns 0.0110ns 0.0103ns
Enqueue 100 453.03ns 0.1215ns 0.0949ns
EnqueueBackwards 100 2,131.89ns 7.4771ns 6.9941ns
EnqueueRandom 100 1,063.06ns 11.4048ns 10.6681ns
EnqueueDequeue 100 3,665.13ns 11.9238ns 10.5701ns
EnqueueBackwardsDequeue 100 5,100.35ns 11.2050ns 10.4812ns
EnqueueRandomDequeue 100 4,456.91ns 36.2898ns 33.9455ns
EnqueueUpdatePriority 100 3,172.86ns 11.5497ns 10.8036ns
EnqueueBackwardsUpdatePriority 100 4,910.67ns 12.0718ns 10.7013ns
EnqueueRandomUpdatePriority 100 2,174.75ns 9.1068ns 8.5185ns
EnqueueContains 100 621.31ns 0.0842ns 0.0703ns
Enqueue 10000 43,682.69ns 15.6044ns 13.8329ns
EnqueueBackwards 10000 416,523.28ns 76.2750ns 55.1518ns
EnqueueRandom 10000 189,990.37ns 141.6339ns 118.2707ns
EnqueueDequeue 10000 951,306.17ns 294.1230ns 260.7323ns
EnqueueBackwardsDequeue 10000 1,232,584.59ns 1,224.8384ns 1,145.7145ns
EnqueueRandomDequeue 10000 1,435,426.26ns 278.4670ns 260.4782ns
EnqueueUpdatePriority 10000 730,930.38ns 309.6355ns 289.6332ns
EnqueueBackwardsUpdatePriority 10000 1,097,982.42ns 561.8409ns 525.5463ns
EnqueueRandomUpdatePriority 10000 465,169.88ns 468.5706ns 438.3012ns
EnqueueContains 10000 61,781.12ns 7.3693ns 6.5327ns

EnqueueRandomDequeue got significantly worse, everything else stayed about the same. Seems inlining all the way is the winner on this one, so I went back to Change 16.


Change 18: Replace division with bit shift on OnNodeUpdated

Method QueueSize Mean Error StdDev
Enqueue 1 25.08ns 0.0619ns 0.0579ns
EnqueueBackwards 1 22.15ns 0.0033ns 0.0030ns
EnqueueRandom 1 22.42ns 0.0427ns 0.0399ns
EnqueueDequeue 1 22.76ns 0.0095ns 0.0084ns
EnqueueBackwardsDequeue 1 22.61ns 0.0202ns 0.0169ns
EnqueueRandomDequeue 1 23.58ns 0.0666ns 0.0623ns
EnqueueUpdatePriority 1 27.10ns 0.0060ns 0.0050ns
EnqueueBackwardsUpdatePriority 1 27.63ns 0.0074ns 0.0062ns
EnqueueRandomUpdatePriority 1 27.41ns 0.0085ns 0.0075ns
EnqueueContains 1 27.27ns 0.0561ns 0.0525ns
Enqueue 100 452.98ns 0.0494ns 0.0438ns
EnqueueBackwards 100 2,122.56ns 0.4487ns 0.3977ns
EnqueueRandom 100 1,069.15ns 0.9815ns 0.7663ns
EnqueueDequeue 100 3,392.39ns 11.4523ns 10.7125ns
EnqueueBackwardsDequeue 100 4,875.53ns 9.2865ns 8.2323ns
EnqueueRandomDequeue 100 3,978.40ns 41.7402ns 39.0438ns
EnqueueUpdatePriority 100 3,119.82ns 16.0411ns 15.0048ns
EnqueueBackwardsUpdatePriority 100 4,878.08ns 15.0893ns 14.1145ns
EnqueueRandomUpdatePriority 100 2,053.08ns 6.6323ns 5.8794ns
EnqueueContains 100 622.32ns 0.1860ns 0.1739ns
Enqueue 10000 43,632.70ns 3.6509ns 3.2364ns
EnqueueBackwards 10000 417,286.99ns 435.7954ns 407.6432ns
EnqueueRandom 10000 190,206.68ns 102.9975ns 91.3046ns
EnqueueDequeue 10000 945,175.22ns 283.3797ns 204.9021ns
EnqueueBackwardsDequeue 10000 1,233,667.01ns 1,110.0706ns 984.0486ns
EnqueueRandomDequeue 10000 1,357,614.41ns 261.9136ns 244.9942ns
EnqueueUpdatePriority 10000 720,677.74ns 245.6859ns 217.7941ns
EnqueueBackwardsUpdatePriority 10000 1,086,414.26ns 297.1541ns 277.9581ns
EnqueueRandomUpdatePriority 10000 450,523.78ns 602.1603ns 563.2611ns
EnqueueContains 10000 61,791.09ns 9.5761ns 8.4889ns

EnqueueRandomDequeue got a noticeable amount better, everything else stayed about the same.


Change 19: Add AggressiveInlining to OnNodeUpdated. Incidentally also added enumeration benchmark to prepare for next change.

Method QueueSize Mean Error StdDev
Enqueue 1 22.64ns 0.0162ns 0.0143ns
EnqueueBackwards 1 22.09ns 0.0032ns 0.0028ns
EnqueueRandom 1 22.29ns 0.0060ns 0.0056ns
EnqueueDequeue 1 22.61ns 0.0084ns 0.0079ns
EnqueueBackwardsDequeue 1 23.02ns 0.0179ns 0.0159ns
EnqueueRandomDequeue 1 22.87ns 0.0053ns 0.0044ns
EnqueueUpdatePriority 1 28.13ns 0.0072ns 0.0064ns
EnqueueBackwardsUpdatePriority 1 29.07ns 0.0392ns 0.0327ns
EnqueueRandomUpdatePriority 1 29.23ns 0.0364ns 0.0341ns
EnqueueContains 1 27.43ns 0.0361ns 0.0320ns
EnqueueEnumerator 1 42.12ns 0.0810ns 0.0718ns
Enqueue 100 452.42ns 0.1000ns 0.0887ns
EnqueueBackwards 100 2,134.12ns 6.2371ns 5.2083ns
EnqueueRandom 100 1,089.93ns 0.4355ns 0.3400ns
EnqueueDequeue 100 3,378.44ns 11.9751ns 11.2015ns
EnqueueBackwardsDequeue 100 4,865.16ns 17.1292ns 16.0226ns
EnqueueRandomDequeue 100 3,996.34ns 20.3571ns 16.9991ns
EnqueueUpdatePriority 100 2,988.75ns 4.5433ns 4.2498ns
EnqueueBackwardsUpdatePriority 100 4,653.77ns 19.5419ns 18.2795ns
EnqueueRandomUpdatePriority 100 1,962.13ns 11.9164ns 11.1466ns
EnqueueContains 100 639.35ns 0.3472ns 0.2711ns
EnqueueEnumerator 100 1,208.26ns 0.3471ns 0.2898ns
Enqueue 10000 43,884.01ns 175.5262ns 164.1873ns
EnqueueBackwards 10000 416,491.99ns 123.6652ns 103.2661ns
EnqueueRandom 10000 190,841.06ns 779.2336ns 690.7702ns
EnqueueDequeue 10000 944,904.45ns 360.7183ns 301.2161ns
EnqueueBackwardsDequeue 10000 1,236,235.25ns 446.4808ns 372.8317ns
EnqueueRandomDequeue 10000 1,360,890.48ns 266.0516ns 222.1651ns
EnqueueUpdatePriority 10000 709,574.40ns 138.3995ns 129.4590ns
EnqueueBackwardsUpdatePriority 10000 1,070,299.20ns 658.2688ns 513.9329ns
EnqueueRandomUpdatePriority 10000 431,503.22ns 68.7147ns 60.9138ns
EnqueueContains 10000 61,852.94ns 13.0878ns 11.6020ns
EnqueueEnumerator 10000 116,744.77ns 103.1914ns 96.5253ns

All Update benchmarks improved slightly.


Change 20: Changed GetEnumerator to use ArraySegment instead of yield.

Method QueueSize Mean Error StdDev
Enqueue 1 22.23ns 0.0110ns 0.0103ns
EnqueueBackwards 1 23.06ns 0.0070ns 0.0051ns
EnqueueRandom 1 22.45ns 0.0321ns 0.0251ns
EnqueueDequeue 1 22.54ns 0.0072ns 0.0056ns
EnqueueBackwardsDequeue 1 22.32ns 0.0178ns 0.0167ns
EnqueueRandomDequeue 1 23.51ns 0.0280ns 0.0234ns
EnqueueUpdatePriority 1 28.01ns 0.0098ns 0.0092ns
EnqueueBackwardsUpdatePriority 1 29.57ns 0.0138ns 0.0129ns
EnqueueRandomUpdatePriority 1 29.06ns 0.0229ns 0.0214ns
EnqueueContains 1 26.54ns 0.0089ns 0.0079ns
EnqueueEnumerator 1 64.02ns 0.4459ns 0.4171ns
Enqueue 100 453.81ns 1.1911ns 0.9946ns
EnqueueBackwards 100 2,131.42ns 0.4016ns 0.3756ns
EnqueueRandom 100 1,065.43ns 0.9738ns 0.7603ns
EnqueueDequeue 100 3,385.99ns 7.8772ns 7.3683ns
EnqueueBackwardsDequeue 100 4,860.17ns 10.4015ns 9.2206ns
EnqueueRandomDequeue 100 4,001.53ns 39.6598ns 37.0978ns
EnqueueUpdatePriority 100 2,990.32ns 5.1090ns 4.7789ns
EnqueueBackwardsUpdatePriority 100 4,653.15ns 2.1216ns 1.7716ns
EnqueueRandomUpdatePriority 100 1,952.45ns 3.6957ns 3.4569ns
EnqueueContains 100 639.73ns 1.9278ns 1.8033ns
EnqueueEnumerator 100 1,059.31ns 0.2522ns 0.2359ns
Enqueue 10000 43,798.96ns 90.1664ns 84.3417ns
EnqueueBackwards 10000 416,606.77ns 89.6748ns 83.8819ns
EnqueueRandom 10000 191,248.68ns 1,050.4011ns 982.5457ns
EnqueueDequeue 10000 945,363.62ns 610.2522ns 570.8302ns
EnqueueBackwardsDequeue 10000 1,235,957.01ns 641.8760ns 535.9955ns
EnqueueRandomDequeue 10000 1,361,838.79ns 905.5018ns 802.7037ns
EnqueueUpdatePriority 10000 708,129.59ns 312.1136ns 260.6290ns
EnqueueBackwardsUpdatePriority 10000 1,070,550.73ns 869.4400ns 770.7358ns
EnqueueRandomUpdatePriority 10000 434,412.61ns 259.0009ns 216.2774ns
EnqueueContains 10000 61,815.88ns 26.7781ns 25.0482ns
EnqueueEnumerator 10000 99,696.08ns 10.0313ns 7.8317ns

Enumeration improved significantly.


Change 21: Experimented with reverting Change 5.

Method QueueSize Mean Error StdDev
Enqueue 1 23.75ns 0.5055ns 1.1095ns
EnqueueBackwards 1 37.97ns 0.4551ns 0.4257ns
EnqueueRandom 1 23.60ns 0.0128ns 0.0120ns
EnqueueDequeue 1 23.13ns 0.0868ns 0.0812ns
EnqueueBackwardsDequeue 1 23.36ns 0.1488ns 0.1319ns
EnqueueRandomDequeue 1 23.54ns 0.0049ns 0.0041ns
EnqueueUpdatePriority 1 28.93ns 0.1532ns 0.1279ns
EnqueueBackwardsUpdatePriority 1 28.85ns 0.0724ns 0.0605ns
EnqueueRandomUpdatePriority 1 29.48ns 0.0309ns 0.0274ns
EnqueueContains 1 38.01ns 0.3712ns 0.3100ns
EnqueueEnumerator 1 64.81ns 0.4121ns 0.3854ns
Enqueue 100 552.78ns 41.4881ns 119.0371ns
EnqueueBackwards 100 2,214.81ns 6.4132ns 5.3553ns
EnqueueRandom 100 1,157.01ns 3.8655ns 3.6158ns
EnqueueDequeue 100 3,714.41ns 59.0365ns 55.2228ns
EnqueueBackwardsDequeue 100 4,847.67ns 29.8156ns 27.8895ns
EnqueueRandomDequeue 100 4,022.34ns 16.0133ns 14.9789ns
EnqueueUpdatePriority 100 3,450.18ns 66.5819ns 132.9712ns
EnqueueBackwardsUpdatePriority 100 4,649.55ns 10.9245ns 9.6843ns
EnqueueRandomUpdatePriority 100 2,062.60ns 3.0710ns 2.8726ns
EnqueueContains 100 1,367.62ns 25.3361ns 23.6994ns
EnqueueEnumerator 100 1,134.59ns 13.2214ns 12.3673ns
Enqueue 10000 45,575.17ns 18.8716ns 16.7292ns
EnqueueBackwards 10000 438,975.82ns 118.8016ns 111.1271ns
EnqueueRandom 10000 204,282.78ns 535.1115ns 500.5436ns
EnqueueDequeue 10000 944,927.39ns 233.6584ns 218.5642ns
EnqueueBackwardsDequeue 10000 1,231,355.00ns 249.8413ns 221.4778ns
EnqueueRandomDequeue 10000 1,378,809.44ns 595.8720ns 497.5801ns
EnqueueUpdatePriority 10000 683,503.96ns 158.7658ns 132.5766ns
EnqueueBackwardsUpdatePriority 10000 1,074,625.59ns 2,134.8339ns 1,892.4745ns
EnqueueRandomUpdatePriority 10000 444,313.13ns 283.2088ns 221.1108ns
EnqueueContains 10000 62,308.63ns 40.6640ns 31.7478ns
EnqueueEnumerator 10000 99,805.43ns 21.3896ns 18.9613ns

Nothing really conclusive, and I think Change 5 should be an improvement, so I put it back.


Final overall chart, with numbers side by side:

Method QueueSize Mean Mean Error Error StdDev StdDev
Enqueue 1 23.70ns 22.23ns 0.0054ns 0.0110ns 0.0048ns 0.0103ns
EnqueueBackwards 1 23.49ns 23.06ns 0.0083ns 0.0070ns 0.0069ns 0.0051ns
EnqueueRandom 1 24.57ns 22.45ns 0.0395ns 0.0321ns 0.0350ns 0.0251ns
EnqueueDequeue 1 23.82ns 22.54ns 0.0041ns 0.0072ns 0.0038ns 0.0056ns
EnqueueBackwardsDequeue 1 24.15ns 22.32ns 0.0423ns 0.0178ns 0.0395ns 0.0167ns
EnqueueRandomDequeue 1 23.98ns 23.51ns 0.0183ns 0.0280ns 0.0163ns 0.0234ns
EnqueueUpdatePriority 1 31.43ns 28.01ns 0.0102ns 0.0098ns 0.0096ns 0.0092ns
EnqueueBackwardsUpdatePriority 1 31.78ns 29.57ns 0.0083ns 0.0138ns 0.0073ns 0.0129ns
EnqueueRandomUpdatePriority 1 34.45ns 29.06ns 0.0161ns 0.0229ns 0.0134ns 0.0214ns
EnqueueContains 1 29.11ns 26.54ns 0.0798ns 0.0089ns 0.0667ns 0.0079ns
Enqueue 100 699.55ns 453.81ns 0.0668ns 1.1911ns 0.0592ns 0.9946ns
EnqueueBackwards 100 3,375.54ns 2,131.42ns 16.4371ns 0.4016ns 14.5710ns 0.3756ns
EnqueueRandom 100 1,596.11ns 1,065.43ns 1.6911ns 0.9738ns 1.4122ns 0.7603ns
EnqueueDequeue 100 4,412.84ns 3,385.99ns 22.2034ns 7.8772ns 20.7691ns 7.3683ns
EnqueueBackwardsDequeue 100 6,969.57ns 4,860.17ns 10.8882ns 10.4015ns 9.6521ns 9.2206ns
EnqueueRandomDequeue 100 5,277.03ns 4,001.53ns 24.9260ns 39.6598ns 23.3158ns 37.0978ns
EnqueueUpdatePriority 100 4,012.80ns 2,990.32ns 9.5671ns 5.1090ns 8.9490ns 4.7789ns
EnqueueBackwardsUpdatePriority 100 6,710.78ns 4,653.15ns 15.7761ns 2.1216ns 14.7570ns 1.7716ns
EnqueueRandomUpdatePriority 100 2,903.03ns 1,952.45ns 8.9722ns 3.6957ns 8.3926ns 3.4569ns
EnqueueContains 100 847.97ns 639.73ns 1.6528ns 1.9278ns 1.5460ns 1.8033ns
Enqueue 10000 68,439.80ns 43,798.96ns 17.5800ns 90.1664ns 14.6801ns 84.3417ns
EnqueueBackwards 10000 751,837.87ns 416,606.77ns 294.1160ns 89.6748ns 275.1162ns 83.8819ns
EnqueueRandom 10000 261,665.52ns 191,248.68ns 109.0163ns 1,050.4011ns 96.6401ns 982.5457ns
EnqueueDequeue 10000 1,063,948.43ns 945,363.62ns 208.7239ns 610.2522ns 195.2404ns 570.8302ns
EnqueueBackwardsDequeue 10000 1,665,493.35ns 1,235,957.01ns 1,482.7118ns 641.8760ns 1,314.3852ns 535.9955ns
EnqueueRandomDequeue 10000 1,582,293.56ns 1,361,838.79ns 213.6763ns 905.5018ns 189.4185ns 802.7037ns
EnqueueUpdatePriority 10000 930,500.69ns 708,129.59ns 318.6172ns 312.1136ns 282.4459ns 260.6290ns
EnqueueBackwardsUpdatePriority 10000 1,590,628.87ns 1,070,550.73ns 504.8412ns 869.4400ns 421.5652ns 770.7358ns
EnqueueRandomUpdatePriority 10000 567,553.84ns 434,412.61ns 81.6522ns 259.0009ns 63.7486ns 216.2774ns
EnqueueContains 10000 84,466.75ns 61,815.88ns 19.4158ns 26.7781ns 16.2131ns 25.0482ns
BlueRaja commented 7 years ago

That is absolutely amazing! I will take a closer look at this soon (probably this weekend)

BlueRaja commented 7 years ago

I have since learned that my original tests had the debugger running - I hadn't known that choosing Release mode wasn't enough to prevent that

Yes, you need to run all benchmarks outside of VS. I should have mentioned that, whoops.

Added AggressiveInlining to Dequeue, Remove, OnNodeUpdated

I will probably remove these. You may see slight increases in benchmarks, but in the real-world these could drastically increase the program size in memory, hurting performance. In general you should avoid forcing inlining for non-trivial public methods.

BlueRaja commented 7 years ago

I'll make the above changes when I find time, but for now I'm going to merge this so that if anyone else wants to open a PR, they don't make changes that conflict with this. It will be a while before I push a new version, as I want to complete the other tasks under 'issues' first.

Nice work!