dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.05k stars 4.69k forks source link

System.Text.Json.JsonSerializer.Deserialize consumes CPU and memory that it never returns #67369

Closed davidbrinton closed 2 years ago

davidbrinton commented 2 years ago

Description

We have an asp.net core Web API project running in AWS ECS Fargate using a Linux host. We have several environments all under light use as we haven't pushed to production yet. We noticed recently that this API will begin failing in each environment after about 5 days of use. The manner after which it would fail is this: with each request to the service (no matter how far apart) the CPU would slowly climb until after several days the CPU would be at 100% and remain there until we restarted the ECS tasks. We found that we could reproduce the problem rapidly by sending a small request one after another. Doing this, after about 340 - 360 requests the application would fail and the CPU would be at 100%. We started eliminating code from the single request one line at a time until we determined that removing a call to System.Text.Json.JsonSerializer.Deserialize eliminated the problem completely. Now, with that line removed, doing the same test never sees the CPU go above about 10% utilization and it always recovers when we stop sending traffic to it.

After finding that this was the cause we tried reproducing it with the minimal amount of code possible. These two lines of code, when executed on each request will provoke the same failure in the same way.

`string myString = "[{\"AdditionalData\":{},\"Alg\":\"RS256\",\"Crv\":null,\"D\":null,\"DP\":null,\"DQ\":null,\"E\":\"AQAB\",\"K\":null,\"KeyId\":\"3cQQcMDfpSo8\u002BVzKljRaaxPJAw1IB4lhKaFsP1OIsgM=\",\"KeyOps\":[],\"Kid\":\"3cQQcMDfpSo8\u002BVzKljRaaxPJAw1IB4lhKaFsP1OIsgM=\",\"Kty\":\"RSA\",\"N\":\"saXuX14AdmIFn2EfBrUlxh9SgzZ973WOR-NhELgaj4KWC7--gvE60wnaqjnsnmOvsCBkoD05Wg2rjL2IdhpsYLDEhpbu5oQPuhXzJ-bf0XR4xzDar8NMp-0w4ySOvAtizwrN7fRVdoHW0jtjEvwKGPyBuln_Dm4c5pQ_wD-m82cGDIE2IVOla2EgdTgEkDEDqo1ZyQz_a-knEH7nJEiHfRSheBFdlqN8_RLW76dP18AXqS8onPTVPP8PRz-8lb5UzKPIYMw6rBG6RbS02zDFMlw7SZaUxxL_gWqnFKzeYzkWiyjaFFigmOq55WybzuyOddoXFkzWhsLC_CL4FaXumQ\",\"Oth\":null,\"P\":null,\"Q\":null,\"QI\":null,\"Use\":\"sig\",\"X\":null,\"X5c\":[],\"X5t\":null,\"X5tS256\":null,\"X5u\":null,\"Y\":null,\"KeySize\":2048,\"HasPrivateKey\":false,\"CryptoProviderFactory\":{\"CryptoProviderCache\":{},\"CustomCryptoProvider\":null,\"CacheSignatureProviders\":true,\"SignatureProviderObjectPoolCacheSize\":96}},{\"AdditionalData\":{},\"Alg\":\"RS256\",\"Crv\":null,\"D\":null,\"DP\":null,\"DQ\":null,\"E\":\"AQAB\",\"K\":null,\"KeyId\":\"DKs\u002BiQBuxsEiSffi9fGvxhepCKjb4hHQWCozaw4VZ88=\",\"KeyOps\":[],\"Kid\":\"DKs\u002BiQBuxsEiSffi9fGvxhepCKjb4hHQWCozaw4VZ88=\",\"Kty\":\"RSA\",\"N\":\"nXW6v8iqh5xOeZRY_EWJVkINXs69wH19iQkSG8W3T9WQfufOKmBCVlj3hvP_zkbC_N7L8EpKejw8WC7J4CvLLU12F-yTPBSlqz3mEQLZkw5jK4bujFOszNyD9bQnqEX1i_X5-Z6mmxDafphoeaFwyJMhc35gCSMEy_7EQF5Ec-oP-QI9nekptvnKCW2fXgciW4IaH7tebTMVg43XZ_esS1bWF-xowjhZWIInySKv7kVE55v45uqtfGu0gL1bIkIzzxyEsy4JuK_nXQKZwZkxUogvlCyhzFcyk_v09ucWwMSfEj60Y501mBTGROXGKJhSSvCmsXc9LJg9qZ-7dryK5w\",\"Oth\":null,\"P\":null,\"Q\":null,\"QI\":null,\"Use\":\"sig\",\"X\":null,\"X5c\":[],\"X5t\":null,\"X5tS256\":null,\"X5u\":null,\"Y\":null,\"KeySize\":2048,\"HasPrivateKey\":false,\"CryptoProviderFactory\":{\"CryptoProviderCache\":{},\"CustomCryptoProvider\":null,\"CacheSignatureProviders\":true,\"SignatureProviderObjectPoolCacheSize\":96}}]";

IList keys = JsonSerializer.Deserialize<IList>(myString);`

Reproduction Steps

  1. Create a new asp.net core framework Web API targeting .net framework 6.0
  2. Create a controller and add the code below to a custom ISecurityTokenValidator that executes when ValidateToken is called during a request to an endpoint on the newly created controller
  3. Set the environment up as shown at the bottom of these instructions
  4. Run HTTP requests to hit the endpoint in your controller one at a time indefinitely (ours failed at about 350 requests, but it could be resource dependent). Also, you will see the CPU climb slowly during the test.

The code to add to the ValidateToken method:

`string myString = "[{\"AdditionalData\":{},\"Alg\":\"RS256\",\"Crv\":null,\"D\":null,\"DP\":null,\"DQ\":null,\"E\":\"AQAB\",\"K\":null,\"KeyId\":\"3cQQcMDfpSo8\u002BVzKljRaaxPJAw1IB4lhKaFsP1OIsgM=\",\"KeyOps\":[],\"Kid\":\"3cQQcMDfpSo8\u002BVzKljRaaxPJAw1IB4lhKaFsP1OIsgM=\",\"Kty\":\"RSA\",\"N\":\"saXuX14AdmIFn2EfBrUlxh9SgzZ973WOR-NhELgaj4KWC7--gvE60wnaqjnsnmOvsCBkoD05Wg2rjL2IdhpsYLDEhpbu5oQPuhXzJ-bf0XR4xzDar8NMp-0w4ySOvAtizwrN7fRVdoHW0jtjEvwKGPyBuln_Dm4c5pQ_wD-m82cGDIE2IVOla2EgdTgEkDEDqo1ZyQz_a-knEH7nJEiHfRSheBFdlqN8_RLW76dP18AXqS8onPTVPP8PRz-8lb5UzKPIYMw6rBG6RbS02zDFMlw7SZaUxxL_gWqnFKzeYzkWiyjaFFigmOq55WybzuyOddoXFkzWhsLC_CL4FaXumQ\",\"Oth\":null,\"P\":null,\"Q\":null,\"QI\":null,\"Use\":\"sig\",\"X\":null,\"X5c\":[],\"X5t\":null,\"X5tS256\":null,\"X5u\":null,\"Y\":null,\"KeySize\":2048,\"HasPrivateKey\":false,\"CryptoProviderFactory\":{\"CryptoProviderCache\":{},\"CustomCryptoProvider\":null,\"CacheSignatureProviders\":true,\"SignatureProviderObjectPoolCacheSize\":96}},{\"AdditionalData\":{},\"Alg\":\"RS256\",\"Crv\":null,\"D\":null,\"DP\":null,\"DQ\":null,\"E\":\"AQAB\",\"K\":null,\"KeyId\":\"DKs\u002BiQBuxsEiSffi9fGvxhepCKjb4hHQWCozaw4VZ88=\",\"KeyOps\":[],\"Kid\":\"DKs\u002BiQBuxsEiSffi9fGvxhepCKjb4hHQWCozaw4VZ88=\",\"Kty\":\"RSA\",\"N\":\"nXW6v8iqh5xOeZRY_EWJVkINXs69wH19iQkSG8W3T9WQfufOKmBCVlj3hvP_zkbC_N7L8EpKejw8WC7J4CvLLU12F-yTPBSlqz3mEQLZkw5jK4bujFOszNyD9bQnqEX1i_X5-Z6mmxDafphoeaFwyJMhc35gCSMEy_7EQF5Ec-oP-QI9nekptvnKCW2fXgciW4IaH7tebTMVg43XZ_esS1bWF-xowjhZWIInySKv7kVE55v45uqtfGu0gL1bIkIzzxyEsy4JuK_nXQKZwZkxUogvlCyhzFcyk_v09ucWwMSfEj60Y501mBTGROXGKJhSSvCmsXc9LJg9qZ-7dryK5w\",\"Oth\":null,\"P\":null,\"Q\":null,\"QI\":null,\"Use\":\"sig\",\"X\":null,\"X5c\":[],\"X5t\":null,\"X5tS256\":null,\"X5u\":null,\"Y\":null,\"KeySize\":2048,\"HasPrivateKey\":false,\"CryptoProviderFactory\":{\"CryptoProviderCache\":{},\"CustomCryptoProvider\":null,\"CacheSignatureProviders\":true,\"SignatureProviderObjectPoolCacheSize\":96}}]";

IList keys = JsonSerializer.Deserialize<IList>(myString);`

The string in this reproduction is the raw result from AWS when you request the public Cognito signing keys for your Cognito app pool. A set of similar keys can be generated by anyone with a Cognito app pool, so your best chance at using a string similar to ours is to create a Cognito app pool and use this URL template to get the signing keys:

https://cognito-idp.us-east-1.amazonaws.com/yourUserPoolId/.well-known/jwks.json

Expected behavior

We expect the CPU to bounce up and down as requests hit the API, never going above more than 10% or so. We also expect the CPU to return to an idle state when requests to the endpoint stop.

Actual behavior

The CPU climbs continuously to 100% and never recovers even if you stop sending messages to the API for several hours.

Regression?

Unknown

Known Workarounds

For us it was simple enough to stop serializing the string in the first place, so we just removed the code. We don't know of any workarounds that include the use of JsonSerializer.Deserialize

Configuration

No response

Other information

No response

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis See info in area-owners.md if you want to be subscribed.

Issue Details
### Description We have an asp.net core Web API project running in AWS ECS Fargate using a Linux host. We have several environments all under light use as we haven't pushed to production yet. We noticed recently that this API will begin failing in each environment after about 5 days of use. The manner after which it would fail is this: with each request to the service (no matter how far apart) the CPU would slowly climb until after several days the CPU would be at 100% and remain there until we restarted the ECS tasks. We found that we could reproduce the problem rapidly by sending a small request one after another. Doing this, after about 340 - 360 requests the application would fail and the CPU would be at 100%. We started eliminating code from the single request one line at a time until we determined that removing a call to `System.Text.Json.JsonSerializer.Deserialize` eliminated the problem completely. Now, with that line removed, doing the same test never sees the CPU go above about 10% utilization and it always recovers when we stop sending traffic to it. After finding that this was the cause we tried reproducing it with the minimal amount of code possible. These two lines of code, when executed on each request will provoke the same failure in the same way. `string myString = // string contains sensitive data, explanation for how to get a similar string is in reproduction steps System.Text.Json.JsonSerializer.Deserialize>(myString);` ### Reproduction Steps 1. Create a new asp.net core framework Web API targeting .net framework 6.0 2. Create a controller and add the code below to a custom `ISecurityTokenValidator` that executes when `ValidateToken` is called during a request to an endpoint on the newly created controller 3. Set the environment up as shown at the bottom of these instructions 4. Run HTTP requests to hit the endpoint in your controller one at a time indefinitely (ours failed at about 350 requests, but it could be resource dependent). Also, you will see the CPU climb slowly during the test. The code to add to the `ValidateToken` method: `string myString = // AWS cognito user pool public signing keys; System.Text.Json.JsonSerializer.Deserialize>(myString);` The string in this reproduction is the raw result from AWS when you request the public Cognito signing keys for your Cognito app pool. I can't share them as they would then be compromised, unfortunately. However, a set of similar keys can be generated by anyone with a Cognito app pool, so your best chance at using a string similar to ours is to create a Cognito app pool and use this URL template to get the signing keys: https://cognito-idp.us-east-1.amazonaws.com/yourUserPoolId/.well-known/jwks.json - Environment: Linux - Platform AWS ECS Fargate - ECS Task CPU (unit) 256 - ECS Task memory (MiB) 512 - .Net Target Framework 6.0 ### Expected behavior We expect the CPU to bounce up and down as requests hit the API, never going above more than 10% or so. We also expect the CPU to return to an idle state when requests to the endpoint stop. ### Actual behavior The CPU climbs continuously to 100% and never recovers even if you stop sending messages to the API for several hours. ### Regression? Unknown ### Known Workarounds For us it was simple enough to stop serializing the string in the first place, so we just removed the code. We don't know of any workarounds that include the use of `JsonSerializer.Deserialize` ### Configuration _No response_ ### Other information _No response_
Author: davidbrinton
Assignees: -
Labels: `area-System.Text.Json`, `untriaged`
Milestone: -
eiriktsarpalis commented 2 years ago

We would need some form of standalone reproduction to evaluate this. If possible, please provide some form of sanitized input that demonstrates the issue at hand.

ghost commented 2 years ago

This issue has been marked needs-author-action since it may be missing important information. Please refer to our contribution guidelines for tips on how to report issues effectively.

davidbrinton commented 2 years ago

@eiriktsarpalis I created a string that can be used to reproduce the problem by generating new signing keys for a fake Cognito account that has now been deleted. The problem description has been updated to contain this new data.

eiriktsarpalis commented 2 years ago

@davidbrinton I tried running the isolated deserialization snippets you shared but didn't observe anything out of the ordinary. We'd need some kind of standalone reproduction to further evaluate this. Would it be possible to share a concole app (not an aspnetcore app) that deterministically demonstrates the problem?

ghost commented 2 years ago

This issue has been marked needs-author-action since it may be missing important information. Please refer to our contribution guidelines for tips on how to report issues effectively.

ghost commented 2 years ago

This issue has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

ghost commented 2 years ago

This issue will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the issue, but please note that the issue will be locked if it remains inactive for another 30 days.