dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.45k stars 10.03k forks source link

Redis Performance Degrade after upgrade to .NET6 #39961

Closed goldytech closed 2 years ago

goldytech commented 2 years ago

Description

We are facing performance issues after upgrading to.NET6 from.NET5. The issue is seen in reading data from Redis. We are using IDistributedCache for this. No change in code, just a simple upgrade.API is hosted on AWS EKS using Redis managed service In .net 5 Redis is taken 1.29 ms. In .net 6 Redis is taken 15.8 ms

We are using GetAsync method of IDistributedCache value = await this._cache.GetStringAsync(key);

Reproduction Steps

.NET 5 image

.NET 6 image

Expected behavior

.NET 6 should improve the performance

Actual behavior

.NET 6 has degraded performance

Regression?

Yes application was more performant in .NET 5

Known Workarounds

No response

Configuration

Application is running on AWS cloud in EKS cluster and using Redis managed service of it. Kubernetes version is 1.21 Redis version is 6.0 Docker image is mcr.microsoft.com/dotnet/aspnet:6.0 (runtime) and mcr.microsoft.com/dotnet/sdk:6.0 (build)

Other information

No response

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

vasicvuk commented 2 years ago

Today we did an upgrade from .NET 3.1 project to .NET 6.0 and the performance is better on .NET 3.1. We have a lot of middlewares so we still haven't detected where the performance is degraded but we do not use DistributedCache.

We also moved from Newtonsoft JSON to System.Text and created switchable implementation in order to test performance but the performance itself is not because of serialization since it is also slower for plain text endpoints.

Also using it on docker image and K8s for running both .net 3.1 and .net 6.0 projects.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-extensions-caching See info in area-owners.md if you want to be subscribed.

Issue Details
### Description We are facing performance issues after upgrading to.NET6 from.NET5. The issue is seen in reading data from Redis. We are using IDistributedCache for this. No change in code, just a simple upgrade.API is hosted on AWS EKS using Redis managed service In .net 5 Redis is taken 1.29 ms. In .net 6 Redis is taken 15.8 ms We are using GetAsync method of IDistributedCache `value = await this._cache.GetStringAsync(key);` ### Reproduction Steps .NET 5 ![image](https://user-images.githubusercontent.com/1241633/152348586-25ffe108-b379-4d74-91bf-d55724f3e2b7.png) .NET 6 ![image](https://user-images.githubusercontent.com/1241633/152348741-1efe69ec-49f3-4550-92a1-d48b9f152bc4.png) ### Expected behavior .NET 6 should improve the performance ### Actual behavior .NET 6 has degraded performance ### Regression? Yes application was more performant in .NET 5 ### Known Workarounds _No response_ ### Configuration Application is running on AWS cloud in EKS cluster and using Redis managed service of it. Kubernetes version is 1.21 Redis version is 6.0 Docker image is mcr.microsoft.com/dotnet/aspnet:6.0 (runtime) and mcr.microsoft.com/dotnet/sdk:6.0 (build) ### Other information _No response_
Author: goldytech
Assignees: -
Labels: `untriaged`, `area-Extensions-Caching`
Milestone: -
eerhardt commented 2 years ago

The Microsoft.Extensions.Caching.StackExchangeRedis package is in the dotnet/aspnetcore repo.

goldytech commented 2 years ago

@eerhardt Have you transferred this issue to the aspnetcore repo? If yes, can you please share the link?

sebastienros commented 2 years ago

I will proceed by creating a benchmark to compare IDistributedCache.GetStringAsync(string) performance between .NET 5 and .NET 6. What is the approximate size of the cache entries that are returned (in bytes)?

eerhardt commented 2 years ago

@eerhardt Have you transferred this issue to the aspnetcore repo? If yes, can you please share the link?

This current issue is now the link. https://github.com/dotnet/aspnetcore/issues/39961

sebastienros commented 2 years ago

I added a benchmark returning fetching 1KB from Redis and returning a small string to focus on the Redis communication. The results show a regression close to noise in this benchmark. The application, Redis and load generation run on different machines in a private vlan. I used 64 concurrent connections and max throughput.

https://github.com/aspnet/Benchmarks/tree/main/src/BenchmarksApps/DistributedCache

Please share any more details on the kind of data you are loading.

| application           | redis2_net5 | redis2_net6 |         |
| --------------------- | ----------- | ----------- | ------- |
| CPU Usage (%)         |          95 |          94 |  -1.05% |
| Cores usage (%)       |       1,135 |       1,127 |  -0.70% |
| Working Set (MB)      |         164 |         164 |   0.00% |
| Private Memory (MB)   |         789 |         980 | +24.21% |
| Build Time (ms)       |       2,069 |       2,509 | +21.27% |
| Start Time (ms)       |           0 |           0 |         |
| Published Size (KB)   |      94,492 |      93,239 |  -1.33% |
| .NET Core SDK Version |     5.0.404 |     6.0.101 |         |

| load                   | redis2_net5 | redis2_net6 |         |
| ---------------------- | ----------- | ----------- | ------- |
| CPU Usage (%)          |          44 |          45 |  +2.27% |
| Cores usage (%)        |         528 |         541 |  +2.46% |
| Working Set (MB)       |          41 |          43 |  +4.88% |
| Private Memory (MB)    |         110 |         110 |   0.00% |
| Start Time (ms)        |          72 |          71 |  -1.39% |
| First Request (ms)     |          85 |          89 |  +4.71% |
| Requests               |   1,949,107 |   1,929,753 |  -0.99% |
| Bad responses          |           0 |           0 |         |
| Latency 50th (us)      |         465 |         474 |  +1.94% |
| Latency 75th (us)      |         555 |         565 |  +1.80% |
| Latency 90th (us)      |         647 |         654 |  +1.08% |
| Latency 95th (us)      |         708 |         712 |  +0.56% |
| Latency 99th (us)      |         861 |         879 |  +2.09% |
| Mean latency (us)      |         489 |         494 |  +1.01% |
| Max latency (us)       |       6,764 |       8,506 | +25.75% |
| Requests/sec           |     129,950 |     128,651 |  -1.00% |
| Requests/sec (max)     |     158,974 |     176,907 | +11.28% |
| Read throughput (MB/s) |       14.13 |       13.99 |  -1.00% |
sebastienros commented 2 years ago

@vasicvuk there is nothing in your issue that we can work on. Can you file a different issue when you have more information or a sample application that would reproduce the issue?

goldytech commented 2 years ago

Thanks, @sebastienros, for running the benchmarks. In our scenario, we are storing the value of the Redis key as a collection of objects. We serialize this collection as a JSON string and store this in the Redis. The size of this data can vary from microservice to microservice. But you can safely assume that size would range from 5kb to 10kb. System.Text.Json is used for serialization and deserialization. LMK if you have any further queries.

vasicvuk commented 2 years ago

@vasicvuk there is nothing in your issue that we can work on. Can you file a different issue when you have more information or a sample application that would reproduce the issue?

I think that the issue is because of the Postgres EF Core library. I opened the issue there. Thanks.

sebastienros commented 2 years ago

Ran again with 64B key length and 10KB content using GetStringAsync() and a single connection to compare latencies only. It's pretty much the same for me.

At this point it would help if you could produce an app that triggers the same behavior you are seeing. Also you are including JSON deserialization in your latencies, are you sure there isn't something else? How can you say it's coming from StackExchange.Redis or the related IDistributedCache implementation?

net5.0

Statistics        Avg      Stdev        Max
  Reqs/sec      2300.32     252.06    2600.47
  Latency      432.19us   176.93us     6.87ms
  HTTP codes:
    1xx - 0, 2xx - 11500, 3xx - 0, 4xx - 0, 5xx - 0

net6.0

Statistics        Avg      Stdev        Max
  Reqs/sec      2312.74     251.61    2627.42
  Latency      429.80us   177.17us     7.06ms
  HTTP codes:
    1xx - 0, 2xx - 11562, 3xx - 0, 4xx - 0, 5xx - 0
goldytech commented 2 years ago

Thanks, @sebastienros. Earlier you mentioned in your comment that the same behavior is observed.

I added a benchmark returning fetching 1KB from Redis and returning a small string to focus on the Redis communication. The results show a regression close to noise in this benchmark. The application, Redis and load generation run on different machines in a private vlan. I used 64 concurrent connections and max throughput.

It would be difficult for me to share the actual app as it has many dependencies which are resolved from the private NuGet repository.

To confirm the issue with StackExchange.Redis ,

See below the screenshots of the entire transaction trace, In .NET 5 with Redis.Exchange package version to 5.0.1 the performance was 1.23 ms

image

whereas the same with .NET 6 with StackExchange.Redis package 6.0.1 version it is 11.6 ms

image

From the implementation point of view, we are not doing any different what is documented.

In the Startup.cs we do CacheRegistration which has a Singleton lifetime

services.AddStackExchangeRedisCache((Action<RedisCacheOptions>) (option => option.Configuration = configuration.ValuationRedisConnectionString));

we then inject IDistributedCache in constructor and uses its corresponding methods to retrieve the data

sebastienros commented 2 years ago

What version of Redis are you using? Some commands are deprecated in 4.0 which I used) What expiration options do you use on your SETs?

To confirm the issue with StackExchange.Redis

The charts only show "REDIS" latencies, right? Not latencies of the SE.Redis calls. It doesn't prove the issue is with the library. You could prove it's not an issue with REDIS itself if you switched back to 5.0 and it was fast again. Could be REDIS which became slow, or the network... You could also prove it by adding logging on the IDistributedCache calls that are worrisome. And show the values with 5.0 and 6.0.

Can you create an app, or use mine, and deploy it on the same environment as your app? If you can't do that, can you get a trace when redis is used, a few seconds would be enough. Ideally also with the previous version but.

We are still trying to prove that difference is coming from SE.Redis 6.0.

goldytech commented 2 years ago

We are using AWS Elastic cache Redis version 6.0.5. The keys are auto expired in 24 hours. Our observability tool, New Relic, shows the trace of the entire request.

See the screenshot below, which shows the trace of POST API In which REDIS calls are also made along with other IO operations. The Redis call took 90 ms, which is higher than accepted.

image

Do you anticipate it as a dotnet runtime issue and not a Redis Package issue? We tried another test also where we downgraded the SE.Redis package to 5.0.1 while keeping the runtime to .net 6 didn't observe any noticeable change from the previous.

sebastienros commented 2 years ago

Do you anticipate it as a dotnet runtime issue and not a Redis Package issue?

If we believe you correctly downgraded the Distributed cache to 5.0.1 and it didn't change the perf, then there is no regression in either Distributed cache or StackExchange.Redis for this usage. This issue could be closed.

To make progress you still need to downgrade to .NET 5.0 and establish it's fast again. Then you should take a trace with .NET 5.0 and one when running .NET 6.0 (at least this one). The other option is to create an application that reproduces the issue, not necessarily your whole application. Without this there is nothing more we can do.

ghost commented 2 years ago

Hi @goldytech. We have added the "Needs: Author Feedback" label to this issue, which indicates that we have an open question for you before we can take further action. This issue will be closed automatically in 7 days if we do not hear back from you by then - please feel free to re-open it if you come back to this issue after that time.

ghost commented 2 years ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment. If it is closed, feel free to comment when you are able to provide the additional information and we will re-investigate.

See our Issue Management Policies for more information.

goldytech commented 2 years ago

@sebastienros Attached is the sample app We did the testing with various redis value sizes 988 KB 710 KB 473 KB 475 KB 311 KB 120 KB 16 KB 1 KB 378 Bytes 385 Bytes

.NET 5 average response time is 7.86 ms whereas .NET 6 22.7 ms sample-redis-service.zip

sebastienros commented 2 years ago

@goldytech That is great, thank you.

sebastienros commented 2 years ago

Can you explain your testing methodology?

sebastienros commented 2 years ago

I ran your application with a slot of 500KB, locally. I updated it to multi-target net5.0 and net6.0, pointing to 5.0 and 6. version of the Redis NuGet package.

I got these results:

net6.0

PS C:\applications\bombardier> .\bombardier.exe -c 1 -d 5s -l -k -H "Content-Type: application/json" -m POST -f .\data.txt https://localhost:5001/v1/redis-values
Bombarding https://localhost:5001/v1/redis-values for 5s using 1 connection(s)
[==================================================================================================================] 5s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec       132.76      29.31     200.00
  Latency        7.53ms     2.59ms    41.02ms
  Latency Distribution
     50%     7.00ms
     75%     8.00ms
     90%     8.02ms
     95%     9.05ms
     99%    21.02ms
  HTTP codes:
    1xx - 0, 2xx - 663, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    36.47KB/s

net5.0

PS C:\applications\bombardier> .\bombardier.exe -c 1 -d 5s -l -k -H "Content-Type: application/json" -m POST -f .\data.txt https://localhost:5001/v1/redis-values
Bombarding https://localhost:5001/v1/redis-values for 5s using 1 connection(s)
[================================================================================================================================================================] 5s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec       119.98      26.79     199.98
  Latency        8.33ms    13.91ms   346.55ms
  Latency Distribution
     50%     7.02ms
     75%     8.00ms
     90%     9.00ms
     95%    10.00ms
     99%    16.03ms
  HTTP codes:
    1xx - 0, 2xx - 600, 3xx - 0, 4xx - 0, 5xx - 0
    others - 0
  Throughput:    33.00KB/s

There is no significant difference between the two.

My repro steps:

Please try these steps to confirm my results, or provide detailed instructions on how to reproduce the problem.

ghost commented 2 years ago

Hi @goldytech. We have added the "Needs: Author Feedback" label to this issue, which indicates that we have an open question for you before we can take further action. This issue will be closed automatically in 7 days if we do not hear back from you by then - please feel free to re-open it if you come back to this issue after that time.

goldytech commented 2 years ago

Thanks, @sebastienros, for sharing the results. The issue is more about the reads than writes. The API that was having the performance issue is only reading the data from Redis. There is a worker service that writes the data into Redis.

I ran performance tests using the benchmark dotnet library The result indicates that the performance looks good when the benchmark application is running on the host machine and connecting to the Redis in the docker (Avg response time 1.8 ms)

But if the same benchmark application is deployed on local K8s (minikube) connecting to Redis running also in K8s (3 pods, one primary, and two replicas), The performance is not the same as running on docker. Avg response time is 9.7 ms.

Further, I deployed the same benchmark application into our K8s env (AWS EKS), connecting to Redis Elastic cache instance. The performance got deteriorated, and avg response time was 17 ms.

The tests were run with .NET 6 sdk version 6.0.2 and Microsoft.Extensions.Caching.StackExchangeRedis version 6.0.2

The benchmark application was a simple console app, having no dependencies of aspnet host, Just StackExchange Redis Nuget packages

Is this due to network latency as the same application is behaving differently in separate environments?

sebastienros commented 2 years ago

The issue is more about the reads than writes

I did benchmarked reads. I mentioned I had to add a write method to your sample app so we are clear about the only changes I made would not change the results.

But if the same benchmark application is deployed on local K8s (minikube) connecting to Redis running also in K8s (3 pods, one primary, and two replicas), The performance is not the same as running on docker. Avg response time is 9.7 ms. Further, I deployed the same benchmark application into our K8s env (AWS EKS), connecting to Redis Elastic cache instance. The performance got deteriorated, and avg response time was 17 ms.

This is not what your bug report is about and what I am investigating. We are looking into .NET 5.0 vs .NET 6.0 performance. Unless all your statements imply that it's only an issue with .NET 6.0.

The benchmark application was a simple console app, having no dependencies of aspnet host, Just StackExchange Redis Nuget packages

Why did you share a web application then? If you are able to reproduce the issue it's better if you share the same app that you used. And if the issue (a regression) appears only in AWS EKS, then it's also important to mention so we don't spend too much time on local tests.

goldytech commented 2 years ago

Why did you share a web application then? If you are able to reproduce the issue it's better if you share the same app that you used. And if the issue (a regression) appears only in AWS EKS, then it's also important to mention so we don't spend too much time on local tests.

In the bug description, I had clearly mentioned the environment (AWS EKS and AWS Elastic cache) I created a separate application to isolate the issue so that it can be cleared that it is a .NET Runtime issue or Redis issue.