Azure / azure-amqp

AMQP C# library
Other
94 stars 70 forks source link

Array Encoding/Decoding performance improvements #170

Closed eerhardt closed 3 years ago

eerhardt commented 3 years ago

This is a work in progress to show how array encoding/decoding of primitive types can be improved. The basic idea is to eliminate boxing and unboxing of primitive types. And to eliminate some unnecessary allocations when using Multiple<T>.

If this approach looks acceptable, I will complete the other primitive type encodings and mark the PR as "ready for review".

Using the benchmark test located at: https://gist.github.com/eerhardt/db06d97c93faaae9b9aec79811c94559, I am seeing the following micro-benchmark results:

Master branch:

Method Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
ArrayAmqpSymbolDecode_1M_MAA 960.52 us 17.759 us 15.743 us 960.93 us 126.9531 52.7344 - 647112 B
ArrayAmqpSymbolDecode_1K_MAA 82.93 us 1.631 us 3.369 us 81.41 us 15.5029 - - 65192 B
ArrayAmqpSymbolEncode_100K_MAA 1,889.03 us 27.619 us 23.063 us 1,891.84 us 195.3125 - - 819299 B
ArrayAmqpSymbolEncode_1K_MAA 190.24 us 2.058 us 1.925 us 189.96 us 19.5313 - - 82144 B
Bytes_Encode_MAA 34.47 us 0.653 us 0.699 us 34.30 us - - - -
Bytes_Decode_MAA 684.08 us 12.820 us 12.591 us 687.00 us 7.8125 7.8125 7.8125 1048600 B
ArrayInt32Encode_MAA_1M 134,211.00 us 2,334.898 us 2,184.065 us 134,368.90 us 12000.0000 - - 50332094 B
ArrayInt32Encode_MAA_1K 129.48 us 1.753 us 1.639 us 129.66 us 11.7188 - - 49264 B
ArrayInt32Decode_1M_MAA 103,514.40 us 1,373.994 us 1,285.235 us 103,510.78 us 6000.0000 - - 29360299 B
ArrayInt32Decode_1K_MAA 99.03 us 1.256 us 1.113 us 99.29 us 6.8359 - - 28696 B

PR changes:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ArrayAmqpSymbolDecode_1M_MAA 738.915 us 9.8911 us 8.2595 us 79.1016 32.2266 - 401359 B
ArrayAmqpSymbolDecode_1K_MAA 67.218 us 0.7661 us 0.7166 us 9.6436 0.2441 - 40616 B
ArrayAmqpSymbolEncode_100K_MAA 528.316 us 7.6527 us 7.1584 us - - - 169 B
ArrayAmqpSymbolEncode_1K_MAA 51.165 us 0.7409 us 0.6187 us - - - 136 B
Bytes_Encode_MAA 34.367 us 0.6847 us 0.7031 us - - - -
Bytes_Decode_MAA 723.398 us 8.8652 us 8.2926 us 7.8125 7.8125 7.8125 1048600 B
ArrayInt32Encode_MAA_1M 2,208.596 us 26.4736 us 23.4681 us - - - -
ArrayInt32Encode_MAA_1K 2.227 us 0.0441 us 0.0977 us 2.206 us - - - -
ArrayInt32Decode_1M_MAA 3,425.374 us 52.6210 us 41.0830 us 31.2500 31.2500 31.2500 4194333 B
ArrayInt32Decode_1K_MAA 1.518 us 0.0253 us 0.0224 us 0.9842 - - 4120 B

cc @xinchen10 - let me know what you think about the approach, and if I should continue this work.

xinchen10 commented 3 years ago

I like the idea and the results looks good. The current code tries to reuse the other encodings for array items but has to pay the price of boxing/unboxing for primitive types. Looking at the int32 encoding I guess we will need to create other primitive encoding types to handle array encoding. If we have to have type specific code, maybe we can keep all of them in one place, e.g. the ArrayEncoding class. Let me look at this area a bit more. The same issue exists in list encoding as well, I believe. So if we have an approach that also works for list, it would be great.

xinchen10 commented 3 years ago

@eerhardt Hi, the approach looks good to me. We can continue with other primitive types. The list encoding only matters when the items are of the same type so we don't have to worry about that for now. Thanks.

danielmarbach commented 3 years ago

I took the benchmark and baked it into https://github.com/danielmarbach/azure-amqp-benchmarks

I also started working a bit on the other encodings and removed a bit of reflection in there as a start. The results are pretty impressive so far


BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
AMD Ryzen 9 3950X, 1 CPU, 32 logical and 16 physical cores
.NET Core SDK=5.0.202
  [Host]   : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT
  ShortRun : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT

Job=ShortRun  Runtime=.NET Core 5.0  IterationCount=3  
LaunchCount=1  WarmupCount=3  
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated Code Size
ArrayAmqpSymbolDecode_1M_MAA 514.10 μs 133.981 μs 7.344 μs 76.1719 26.3672 - 647112 B 276 B
ArrayAmqpSymbolDecode_1K_MAA 49.16 μs 1.496 μs 0.082 μs 7.7515 0.8545 - 65192 B 276 B
ArrayAmqpSymbolEncode_100K_MAA 1,499.99 μs 791.986 μs 43.411 μs 97.6563 - - 819296 B 226 B
ArrayAmqpSymbolEncode_1K_MAA 146.41 μs 18.566 μs 1.018 μs 9.7656 - - 82144 B 226 B
Bytes_Encode_MAA 24.22 μs 1.339 μs 0.073 μs - - - - 326 B
Bytes_Decode_MAA 326.62 μs 495.364 μs 27.153 μs 7.8125 7.8125 7.8125 1048600 B 555 B
ArrayInt32Encode_MAA_1M 103,733.01 μs 4,321.646 μs 236.884 μs 6000.0000 - - 50331760 B 320 B
ArrayInt32Encode_MAA_1K 101.44 μs 21.383 μs 1.172 μs 5.8594 - - 49264 B 147 B
ArrayInt32Decode_1M_MAA 51,799.29 μs 14,176.034 μs 777.036 μs 3000.0000 - - 29361028 B 276 B
ArrayInt32Decode_1K_MAA 49.57 μs 15.782 μs 0.865 μs 3.4180 - - 28696 B 276 B
ArrayBoolEncode_MAA_1M 104,410.79 μs 2,945.729 μs 161.465 μs 6000.0000 - - 50331760 B 320 B
ArrayBoolEncode_MAA_1K 104.35 μs 21.079 μs 1.155 μs 5.8594 - - 49264 B 147 B
ArrayBoolDecode_1M_MAA 45,216.44 μs 21,211.800 μs 1,162.690 μs 3000.0000 - - 26214424 B 276 B
ArrayBoolDecode_1K_MAA 42.41 μs 3.713 μs 0.204 μs 3.0518 - - 25624 B 276 B

After

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated Code Size
ArrayAmqpSymbolDecode_1M_MAA 418,058.3 ns 64,368.85 ns 3,528.27 ns 47.3633 15.6250 - 401352 B 276 B
ArrayAmqpSymbolDecode_1K_MAA 41,635.2 ns 8,611.60 ns 472.03 ns 4.8218 0.4883 - 40616 B 276 B
ArrayAmqpSymbolEncode_100K_MAA 449,605.4 ns 34,733.50 ns 1,903.86 ns - - - 168 B 233 B
ArrayAmqpSymbolEncode_1K_MAA 43,055.6 ns 1,191.42 ns 65.31 ns - - - 136 B 233 B
Bytes_Encode_MAA 24,550.4 ns 8,202.06 ns 449.58 ns - - - - 326 B
Bytes_Decode_MAA 328,451.3 ns 384,147.85 ns 21,056.44 ns 7.8125 7.8125 7.8125 1048606 B 555 B
ArrayInt32Encode_MAA_1M 847,356.3 ns 44,919.85 ns 2,462.21 ns - - - - 147 B
ArrayInt32Encode_MAA_1K 956.3 ns 189.64 ns 10.39 ns - - - - 147 B
ArrayInt32Decode_1M_MAA 2,039,561.7 ns 1,260,039.76 ns 69,067.03 ns 27.3438 27.3438 27.3438 4194328 B 276 B
ArrayInt32Decode_1K_MAA 985.5 ns 166.02 ns 9.10 ns 0.4921 - - 4120 B 276 B
ArrayBoolEncode_MAA_1M 993,341.8 ns 123,940.13 ns 6,793.58 ns - - - - 147 B
ArrayBoolEncode_MAA_1K 1,058.1 ns 84.72 ns 4.64 ns - - - - 147 B
ArrayBoolDecode_1M_MAA 1,235,420.1 ns 193,650.74 ns 10,614.65 ns 7.8125 7.8125 7.8125 1048600 B 276 B
ArrayBoolDecode_1K_MAA 1,051.9 ns 41.55 ns 2.28 ns 0.1240 - - 1048 B 276 B

https://github.com/danielmarbach/azure-amqp/tree/encoding

danielmarbach commented 3 years ago

Opened https://github.com/Azure/azure-amqp/pull/185 for now. I'm progressing really slowly since I only have very limited time in my hands right now but if anyone wants to help I'm happy to hand it over or make someone a collaborator on this PR

eerhardt commented 3 years ago

Thanks @danielmarbach for taking over this work! I'm going to close this PR in favor of #185.

danielmarbach commented 3 years ago

I've updated the other PR with the latest numbers