Array Encoding/Decoding performance improvements

eerhardt commented 3 years ago

This is a work in progress to show how array encoding/decoding of primitive types can be improved. The basic idea is to eliminate boxing and unboxing of primitive types. And to eliminate some unnecessary allocations when using Multiple<T>.

If this approach looks acceptable, I will complete the other primitive type encodings and mark the PR as "ready for review".

Using the benchmark test located at: https://gist.github.com/eerhardt/db06d97c93faaae9b9aec79811c94559, I am seeing the following micro-benchmark results:

Master branch:

Method	Mean	Error	StdDev	Median	Gen 0	Gen 1	Gen 2	Allocated
ArrayAmqpSymbolDecode_1M_MAA	960.52 us	17.759 us	15.743 us	960.93 us	126.9531	52.7344	-	647112 B
ArrayAmqpSymbolDecode_1K_MAA	82.93 us	1.631 us	3.369 us	81.41 us	15.5029	-	-	65192 B
ArrayAmqpSymbolEncode_100K_MAA	1,889.03 us	27.619 us	23.063 us	1,891.84 us	195.3125	-	-	819299 B
ArrayAmqpSymbolEncode_1K_MAA	190.24 us	2.058 us	1.925 us	189.96 us	19.5313	-	-	82144 B
Bytes_Encode_MAA	34.47 us	0.653 us	0.699 us	34.30 us	-	-	-	-
Bytes_Decode_MAA	684.08 us	12.820 us	12.591 us	687.00 us	7.8125	7.8125	7.8125	1048600 B
ArrayInt32Encode_MAA_1M	134,211.00 us	2,334.898 us	2,184.065 us	134,368.90 us	12000.0000	-	-	50332094 B
ArrayInt32Encode_MAA_1K	129.48 us	1.753 us	1.639 us	129.66 us	11.7188	-	-	49264 B
ArrayInt32Decode_1M_MAA	103,514.40 us	1,373.994 us	1,285.235 us	103,510.78 us	6000.0000	-	-	29360299 B
ArrayInt32Decode_1K_MAA	99.03 us	1.256 us	1.113 us	99.29 us	6.8359	-	-	28696 B

PR changes:

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
ArrayAmqpSymbolDecode_1M_MAA	738.915 us	9.8911 us	8.2595 us	79.1016	32.2266	-	401359 B
ArrayAmqpSymbolDecode_1K_MAA	67.218 us	0.7661 us	0.7166 us	9.6436	0.2441	-	40616 B
ArrayAmqpSymbolEncode_100K_MAA	528.316 us	7.6527 us	7.1584 us	-	-	-	169 B
ArrayAmqpSymbolEncode_1K_MAA	51.165 us	0.7409 us	0.6187 us	-	-	-	136 B
Bytes_Encode_MAA	34.367 us	0.6847 us	0.7031 us	-	-	-	-
Bytes_Decode_MAA	723.398 us	8.8652 us	8.2926 us	7.8125	7.8125	7.8125	1048600 B
ArrayInt32Encode_MAA_1M	2,208.596 us	26.4736 us	23.4681 us	-	-	-	-
ArrayInt32Encode_MAA_1K	2.227 us	0.0441 us	0.0977 us	2.206 us	-	-	-	-
ArrayInt32Decode_1M_MAA	3,425.374 us	52.6210 us	41.0830 us	31.2500	31.2500	31.2500	4194333 B
ArrayInt32Decode_1K_MAA	1.518 us	0.0253 us	0.0224 us	0.9842	-	-	4120 B

cc @xinchen10 - let me know what you think about the approach, and if I should continue this work.

xinchen10 commented 3 years ago

I like the idea and the results looks good. The current code tries to reuse the other encodings for array items but has to pay the price of boxing/unboxing for primitive types. Looking at the int32 encoding I guess we will need to create other primitive encoding types to handle array encoding. If we have to have type specific code, maybe we can keep all of them in one place, e.g. the ArrayEncoding class. Let me look at this area a bit more. The same issue exists in list encoding as well, I believe. So if we have an approach that also works for list, it would be great.

xinchen10 commented 3 years ago

@eerhardt Hi, the approach looks good to me. We can continue with other primitive types. The list encoding only matters when the items are of the same type so we don't have to worry about that for now. Thanks.

danielmarbach commented 3 years ago

I took the benchmark and baked it into https://github.com/danielmarbach/azure-amqp-benchmarks

I also started working a bit on the other encodings and removed a bit of reflection in there as a start. The results are pretty impressive so far


BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
AMD Ryzen 9 3950X, 1 CPU, 32 logical and 16 physical cores
.NET Core SDK=5.0.202
  [Host]   : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT
  ShortRun : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

Job=ShortRun  Runtime=.NET Core 5.0  IterationCount=3  
LaunchCount=1  WarmupCount=3

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated	Code Size
ArrayAmqpSymbolDecode_1M_MAA	514.10 μs	133.981 μs	7.344 μs	76.1719	26.3672	-	647112 B	276 B
ArrayAmqpSymbolDecode_1K_MAA	49.16 μs	1.496 μs	0.082 μs	7.7515	0.8545	-	65192 B	276 B
ArrayAmqpSymbolEncode_100K_MAA	1,499.99 μs	791.986 μs	43.411 μs	97.6563	-	-	819296 B	226 B
ArrayAmqpSymbolEncode_1K_MAA	146.41 μs	18.566 μs	1.018 μs	9.7656	-	-	82144 B	226 B
Bytes_Encode_MAA	24.22 μs	1.339 μs	0.073 μs	-	-	-	-	326 B
Bytes_Decode_MAA	326.62 μs	495.364 μs	27.153 μs	7.8125	7.8125	7.8125	1048600 B	555 B
ArrayInt32Encode_MAA_1M	103,733.01 μs	4,321.646 μs	236.884 μs	6000.0000	-	-	50331760 B	320 B
ArrayInt32Encode_MAA_1K	101.44 μs	21.383 μs	1.172 μs	5.8594	-	-	49264 B	147 B
ArrayInt32Decode_1M_MAA	51,799.29 μs	14,176.034 μs	777.036 μs	3000.0000	-	-	29361028 B	276 B
ArrayInt32Decode_1K_MAA	49.57 μs	15.782 μs	0.865 μs	3.4180	-	-	28696 B	276 B
ArrayBoolEncode_MAA_1M	104,410.79 μs	2,945.729 μs	161.465 μs	6000.0000	-	-	50331760 B	320 B
ArrayBoolEncode_MAA_1K	104.35 μs	21.079 μs	1.155 μs	5.8594	-	-	49264 B	147 B
ArrayBoolDecode_1M_MAA	45,216.44 μs	21,211.800 μs	1,162.690 μs	3000.0000	-	-	26214424 B	276 B
ArrayBoolDecode_1K_MAA	42.41 μs	3.713 μs	0.204 μs	3.0518	-	-	25624 B	276 B

After

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated	Code Size
ArrayAmqpSymbolDecode_1M_MAA	418,058.3 ns	64,368.85 ns	3,528.27 ns	47.3633	15.6250	-	401352 B	276 B
ArrayAmqpSymbolDecode_1K_MAA	41,635.2 ns	8,611.60 ns	472.03 ns	4.8218	0.4883	-	40616 B	276 B
ArrayAmqpSymbolEncode_100K_MAA	449,605.4 ns	34,733.50 ns	1,903.86 ns	-	-	-	168 B	233 B
ArrayAmqpSymbolEncode_1K_MAA	43,055.6 ns	1,191.42 ns	65.31 ns	-	-	-	136 B	233 B
Bytes_Encode_MAA	24,550.4 ns	8,202.06 ns	449.58 ns	-	-	-	-	326 B
Bytes_Decode_MAA	328,451.3 ns	384,147.85 ns	21,056.44 ns	7.8125	7.8125	7.8125	1048606 B	555 B
ArrayInt32Encode_MAA_1M	847,356.3 ns	44,919.85 ns	2,462.21 ns	-	-	-	-	147 B
ArrayInt32Encode_MAA_1K	956.3 ns	189.64 ns	10.39 ns	-	-	-	-	147 B
ArrayInt32Decode_1M_MAA	2,039,561.7 ns	1,260,039.76 ns	69,067.03 ns	27.3438	27.3438	27.3438	4194328 B	276 B
ArrayInt32Decode_1K_MAA	985.5 ns	166.02 ns	9.10 ns	0.4921	-	-	4120 B	276 B
ArrayBoolEncode_MAA_1M	993,341.8 ns	123,940.13 ns	6,793.58 ns	-	-	-	-	147 B
ArrayBoolEncode_MAA_1K	1,058.1 ns	84.72 ns	4.64 ns	-	-	-	-	147 B
ArrayBoolDecode_1M_MAA	1,235,420.1 ns	193,650.74 ns	10,614.65 ns	7.8125	7.8125	7.8125	1048600 B	276 B
ArrayBoolDecode_1K_MAA	1,051.9 ns	41.55 ns	2.28 ns	0.1240	-	-	1048 B	276 B

https://github.com/danielmarbach/azure-amqp/tree/encoding

danielmarbach commented 3 years ago

Opened https://github.com/Azure/azure-amqp/pull/185 for now. I'm progressing really slowly since I only have very limited time in my hands right now but if anyone wants to help I'm happy to hand it over or make someone a collaborator on this PR

eerhardt commented 3 years ago

Thanks @danielmarbach for taking over this work! I'm going to close this PR in favor of #185.

danielmarbach commented 3 years ago

I've updated the other PR with the latest numbers

Azure / azure-amqp

Array Encoding/Decoding performance improvements #170

Master branch:

PR changes: