Open bill-poole opened 3 months ago
I've ran a before and after below. There's still at least one significant v4 perf improvement on the way, but I expect EfficientDynamoDb will always be significantly faster, even if the gap is reduced.
BenchmarkDotNet=v0.12.1, OS=macOS 14.5 (23F79) [Darwin 23.5.0] Apple M2 Pro, 1 CPU, 12 logical and 12 physical cores .NET Core SDK=8.0.401 [Host] : .NET Core 8.0.8 (CoreCLR 8.0.824.36612, CoreFX 8.0.824.36612), Arm64 RyuJIT DefaultJob : .NET Core 8.0.8 (CoreCLR 8.0.824.36612, CoreFX 8.0.824.36612), Arm64 RyuJIT
Method | EntitiesCount | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|
EfficientDynamoDb | 10 | 26.43 us | 0.515 us | 0.614 us | 2.0752 | 0.0610 | - | 17.22 KB |
aws-sdk-net | 10 | 190.40 us | 3.381 us | 2.998 us | 38.0859 | 9.7656 | - | 314.87 KB |
aws-sdk-net v4 | 10 | 174.77 us | 3.441 us | 5.749 us | 29.2969 | 0.9766 | - | 246.67 KB |
EfficientDynamoDb | 100 | 185.44 us | 2.802 us | 2.484 us | 14.6484 | 0.2441 | - | 119.89 KB |
aws-sdk-net | 100 | 2,090.33 us | 34.854 us | 32.602 us | 359.3750 | 179.6875 | - | 2954.75 KB |
aws-sdk-net v4 | 100 | 2,155.76 us | 42.681 us | 91.875 us | 265.6250 | 125.0000 | - | 2192.58 KB |
EfficientDynamoDb | 1000 | 1,855.11 us | 24.322 us | 21.561 us | 138.6719 | 48.8281 | 1.9531 | 1146.54 KB |
aws-sdk-net | 1000 | 38,908.32 us | 769.038 us | 719.359 us | 3714.2857 | 1214.2857 | 928.5714 | 29340.8 KB |
aws-sdk-net v4 | 1000 | 30,446.25 us | 590.408 us | 679.915 us | 2906.2500 | 1250.0000 | 656.2500 | 21624.82 KB |
Thanks for doing those benchmarks and posting the results! Are you okay if I bring these results to the attention of the AWS team? They are likely going to use the same underlying message building/parsing logic for the client for all their services, which means that its unlikely that the V4 library will solve their performance problems with services other than DynamoDB. For example, there are the same performance problems with sending/receiving SQS messages.
Assuming we don't expect the .NET ecosystem to create alternative client libraries for all performance-sensitive AWS services, I think the community at large is best served by AWS taking a serious look at significantly increasing the performance of their .NET client libraries. And I suspect we have a better chance of such action being taken prior to their GA release.
@bill-poole sure. @normj and the AWS .NET team would be interested I'm sure.
Looking at the v4 preview announcement here, it's clear that v4 represent a performance boost, and not massive improvements you'd see from re-architecting. So libraries like EfficientDynamoDb
will have a place for a long time.
My expectations for the AWS SDK is not for it to be a high-performance library, but shouldn't contain any big perf blunders either. So I'm keen for this change to get merged, which avoid a big string allocation before UTF8 encoding the results. However I don't expect pooling of array buffers (as great as that would be).
I'd like to see AWS SDK also adopt System.Text.Json
at some point, rather than it's internal LitJSON
, but must admit I don't know how much of a perf boost we'd see.
On the SQS performance topic, I've been playing with the idea of a lightweight client implementation for a while. Now that SQS supports the AWS JSON protocol, it would look very similar to EfficientDynamoDb
, ultimately it's never quite made sense to do, as SQS is often used in asynchronous scenarios where latency isn't critical, and so we'd just be looking at minuscule cost savings.
Thanks for tagging me @slang25 that is interesting benchmark results.
I'm not sure if the SDK's will ever reach the impressive performance the authors have done for EfficientDynamoDb given we have to focus on supporting every service and all of the weird corner cases the services have. That being said I view preview 1 of V4 as laying down the foundation changes needed to start make some significant performance gains. The gains we got with preview 1 were largely a consequence of kicking out legacy components and the nullability changes of collections.
I hope now that we have removed .NET Framework 3.5 and added the System.Memory, System.Buffers and System.Text.Json packages to .NET Framework target we now have access to all of the same high performance APIs that EfficientDynamoDb is using. The plan for V4 is to as soon as we have all of the required breaking changes in place but with this new foundation we should be able continue making performance improvements post GA. For example we added the System.Text.Json dependency for .NET Framework but as you said @slang25 we are still using LitJson. I would like to replace that with System.Text.Json and we can still do that post GA now that we have made sure it is available to all targets. I don't expect much speed improvement with System.Text.Json but an allocations improvement which would help at scale.
I'm not sure if the SDK's will ever reach the impressive performance the authors have done for EfficientDynamoDb given we have to focus on supporting every service and all of the weird corner cases the services have.
I assume that the vast majority of the performance difference between the AWS .NET SDK and EfficientDynamoDb is in JSON serialization/deserialization, memory copying and buffer pooling. Furthermore, I assume that the differences between AWS services and the various "corner cases" pertain to the syntax/semantics of each respective service's JSON schema, not anything to do with serialization/deserialization, buffer management, etc.
I recognize that the serialization/deserialization code needs to be specialized per service because it depends on the JSON schema for each service, but I assume that can be auto-generated from the interface definition for each service? Beyond that then, I assume the send/receive logic is reused service to service, such that if that send/receive logic is optimized to eliminate memory copies and use buffer pooling, then the AWS .NET SDK will perform significantly better for all services.
I believe ~10x performance improvement is possible by using auto-generated JSON serialization/deserialization logic and highly optimized send/receive logic that eliminates memory copies and uses buffer pooling.
@normj, are my assumptions correct and if not, can you please help me understand what I've misunderstood?
Preview 1 of the AWS .NET SDK has been released (see https://aws.amazon.com/blogs/developer/preview-1-of-aws-sdk-for-net-v4/). It would be great if the benchmarks published here could be updated to include the V4 preview.