Closed eerhardt closed 3 years ago
I like the idea and the results looks good. The current code tries to reuse the other encodings for array items but has to pay the price of boxing/unboxing for primitive types. Looking at the int32 encoding I guess we will need to create other primitive encoding types to handle array encoding. If we have to have type specific code, maybe we can keep all of them in one place, e.g. the ArrayEncoding class. Let me look at this area a bit more. The same issue exists in list encoding as well, I believe. So if we have an approach that also works for list, it would be great.
@eerhardt Hi, the approach looks good to me. We can continue with other primitive types. The list encoding only matters when the items are of the same type so we don't have to worry about that for now. Thanks.
I took the benchmark and baked it into https://github.com/danielmarbach/azure-amqp-benchmarks
I also started working a bit on the other encodings and removed a bit of reflection in there as a start. The results are pretty impressive so far
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
AMD Ryzen 9 3950X, 1 CPU, 32 logical and 16 physical cores
.NET Core SDK=5.0.202
[Host] : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT
ShortRun : .NET Core 5.0.5 (CoreCLR 5.0.521.16609, CoreFX 5.0.521.16609), X64 RyuJIT
Job=ShortRun Runtime=.NET Core 5.0 IterationCount=3
LaunchCount=1 WarmupCount=3
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated | Code Size |
---|---|---|---|---|---|---|---|---|
ArrayAmqpSymbolDecode_1M_MAA | 514.10 μs | 133.981 μs | 7.344 μs | 76.1719 | 26.3672 | - | 647112 B | 276 B |
ArrayAmqpSymbolDecode_1K_MAA | 49.16 μs | 1.496 μs | 0.082 μs | 7.7515 | 0.8545 | - | 65192 B | 276 B |
ArrayAmqpSymbolEncode_100K_MAA | 1,499.99 μs | 791.986 μs | 43.411 μs | 97.6563 | - | - | 819296 B | 226 B |
ArrayAmqpSymbolEncode_1K_MAA | 146.41 μs | 18.566 μs | 1.018 μs | 9.7656 | - | - | 82144 B | 226 B |
Bytes_Encode_MAA | 24.22 μs | 1.339 μs | 0.073 μs | - | - | - | - | 326 B |
Bytes_Decode_MAA | 326.62 μs | 495.364 μs | 27.153 μs | 7.8125 | 7.8125 | 7.8125 | 1048600 B | 555 B |
ArrayInt32Encode_MAA_1M | 103,733.01 μs | 4,321.646 μs | 236.884 μs | 6000.0000 | - | - | 50331760 B | 320 B |
ArrayInt32Encode_MAA_1K | 101.44 μs | 21.383 μs | 1.172 μs | 5.8594 | - | - | 49264 B | 147 B |
ArrayInt32Decode_1M_MAA | 51,799.29 μs | 14,176.034 μs | 777.036 μs | 3000.0000 | - | - | 29361028 B | 276 B |
ArrayInt32Decode_1K_MAA | 49.57 μs | 15.782 μs | 0.865 μs | 3.4180 | - | - | 28696 B | 276 B |
ArrayBoolEncode_MAA_1M | 104,410.79 μs | 2,945.729 μs | 161.465 μs | 6000.0000 | - | - | 50331760 B | 320 B |
ArrayBoolEncode_MAA_1K | 104.35 μs | 21.079 μs | 1.155 μs | 5.8594 | - | - | 49264 B | 147 B |
ArrayBoolDecode_1M_MAA | 45,216.44 μs | 21,211.800 μs | 1,162.690 μs | 3000.0000 | - | - | 26214424 B | 276 B |
ArrayBoolDecode_1K_MAA | 42.41 μs | 3.713 μs | 0.204 μs | 3.0518 | - | - | 25624 B | 276 B |
After
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated | Code Size |
---|---|---|---|---|---|---|---|---|
ArrayAmqpSymbolDecode_1M_MAA | 418,058.3 ns | 64,368.85 ns | 3,528.27 ns | 47.3633 | 15.6250 | - | 401352 B | 276 B |
ArrayAmqpSymbolDecode_1K_MAA | 41,635.2 ns | 8,611.60 ns | 472.03 ns | 4.8218 | 0.4883 | - | 40616 B | 276 B |
ArrayAmqpSymbolEncode_100K_MAA | 449,605.4 ns | 34,733.50 ns | 1,903.86 ns | - | - | - | 168 B | 233 B |
ArrayAmqpSymbolEncode_1K_MAA | 43,055.6 ns | 1,191.42 ns | 65.31 ns | - | - | - | 136 B | 233 B |
Bytes_Encode_MAA | 24,550.4 ns | 8,202.06 ns | 449.58 ns | - | - | - | - | 326 B |
Bytes_Decode_MAA | 328,451.3 ns | 384,147.85 ns | 21,056.44 ns | 7.8125 | 7.8125 | 7.8125 | 1048606 B | 555 B |
ArrayInt32Encode_MAA_1M | 847,356.3 ns | 44,919.85 ns | 2,462.21 ns | - | - | - | - | 147 B |
ArrayInt32Encode_MAA_1K | 956.3 ns | 189.64 ns | 10.39 ns | - | - | - | - | 147 B |
ArrayInt32Decode_1M_MAA | 2,039,561.7 ns | 1,260,039.76 ns | 69,067.03 ns | 27.3438 | 27.3438 | 27.3438 | 4194328 B | 276 B |
ArrayInt32Decode_1K_MAA | 985.5 ns | 166.02 ns | 9.10 ns | 0.4921 | - | - | 4120 B | 276 B |
ArrayBoolEncode_MAA_1M | 993,341.8 ns | 123,940.13 ns | 6,793.58 ns | - | - | - | - | 147 B |
ArrayBoolEncode_MAA_1K | 1,058.1 ns | 84.72 ns | 4.64 ns | - | - | - | - | 147 B |
ArrayBoolDecode_1M_MAA | 1,235,420.1 ns | 193,650.74 ns | 10,614.65 ns | 7.8125 | 7.8125 | 7.8125 | 1048600 B | 276 B |
ArrayBoolDecode_1K_MAA | 1,051.9 ns | 41.55 ns | 2.28 ns | 0.1240 | - | - | 1048 B | 276 B |
Opened https://github.com/Azure/azure-amqp/pull/185 for now. I'm progressing really slowly since I only have very limited time in my hands right now but if anyone wants to help I'm happy to hand it over or make someone a collaborator on this PR
Thanks @danielmarbach for taking over this work! I'm going to close this PR in favor of #185.
I've updated the other PR with the latest numbers
This is a work in progress to show how array encoding/decoding of primitive types can be improved. The basic idea is to eliminate boxing and unboxing of primitive types. And to eliminate some unnecessary allocations when using
Multiple<T>
.If this approach looks acceptable, I will complete the other primitive type encodings and mark the PR as "ready for review".
Using the benchmark test located at: https://gist.github.com/eerhardt/db06d97c93faaae9b9aec79811c94559, I am seeing the following micro-benchmark results:
Master branch:
PR changes:
cc @xinchen10 - let me know what you think about the approach, and if I should continue this work.