dapr / js-sdk

Dapr SDK for Javascript
Apache License 2.0
194 stars 82 forks source link

State Management store performance #556

Closed SumeetR closed 7 months ago

SumeetR commented 9 months ago

Expected Behavior

The performance of the DAPR js-sdk should be comparable of that of the dotnet-sdk. More specifically, the performance of the State client of the SDKs should be similar.

Actual Behavior

The js-sdk consumes significantly more CPU when storing in the state than the dotnet-sdk.

In our project, we have a Typescript service and a .NET service, each using DAPR with their own sidecar, communicating through the DAPR pubsub and using the DAPR state to store information. All of this is deployed in docker containers.

When running tests we spotted that the typescript project was consuming a large chunk of the available CPU, around 50% of the available CPU. When isolating the issue, disabling writing to the state using DAPR dropped that CPU usage to 20%. The backing storage used is Redis for the state.

Steps to Reproduce the Problem

To exhibit this issue, we've created two repositories, one in Typescript and the other in .NET, which implement a simple API. These APIs expose a HTTP endpoint which stores the payload of the request using the DAPR state into Redis.

Typescript: https://github.com/dkuida/dapr-ts-performance

.NET: https://github.com/dkuida/dapr-dotnet-performance

A docker image was created using github actions for each repo and then tested by calling the HTTP POST endpoint of each API with a concurrency of 10 calls. The deployments were hosted on an AWS EC2 m5a.large machine:

Instance Size vCPU Memory (GiB) Instance Storage (GB) Network Bandwidth (Gbps) EBS Bandwidth (Mbps)
m5a.large 2 8 EBS-Only Up to 10 Up to 2,880

Performance

Typescript

Dapr_typescript_cropped

.NET

Dapr_dotnet_cropped

As the images above show, the Typescript version was consuming considerably more CPU throughout the test.

Observations

There was a noticeable difference in performance when pulling these docker images from a repository, compared to creating the docker image in-situ from the docker-compose file. Building in-situ in the EC2 machine produced similar results from both demo repositories, compared to pulling the image from our private repository.

shubham1172 commented 9 months ago

Hi @SumeetR, behind the scenes both the SDKs are making API calls to the runtime. Now generally speaking we are using node in the JS-SDK today and it can be less performant in this scenario than dotnet, for e.g., see https://raygun.com/blog/dotnet-vs-nodejs. If we test another SDK, say Go-SDK, we might see yet another performance metrics, which is to say we cannot strictly compare SDK performances which each other because of the difference in the underlying frameworks/stacks.

If you are looking to make concurrent calls to the state store, you could explore making Bulk API calls, this will significantly reduce the number of API calls and in-turn the overall memory consumed, you can see from the docs here https://docs.dapr.io/developing-applications/sdks/js/js-client/#save-get-and-delete-application-state

There was a noticeable difference in performance when pulling these docker images from a repository, compared to creating the docker image in-situ from the docker-compose file. Building in-situ in the EC2 machine produced similar results from both demo repositories, compared to pulling the image from our private repository.

If the Dockerfile(s) are the same, this should not happen, right? You could investigate the images further (with some tooling like https://github.com/wagoodman/dive) to check if there are differences in your images.

dapr-bot commented 7 months ago

This issue has been automatically marked as stale because it has not had activity in the last 60 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

dapr-bot commented 7 months ago

This issue has been automatically closed because it has not had activity in the last 67 days. If this issue is still valid, please ping a maintainer and ask them to label it as pinned, good first issue, help wanted or triaged/resolved. Thank you for your contributions.