awslabs / aws-sdk-rust

AWS SDK for the Rust Programming Language
https://awslabs.github.io/aws-sdk-rust/
Apache License 2.0
2.91k stars 245 forks source link

Random spikes in dynamodb update latency #1155

Open nuel77 opened 1 month ago

nuel77 commented 1 month ago

Describe the bug

We are experiencing random latency spikes (which go up to 10s) in DynamoDB operations while using the AWS Rust SDK in our project. These spikes randomly and seem to increase in frequency with increasing time. A weird thing we observed is that the latency doest seem to correleate with the load for example updating 100 items takes 50ms while updating 10 items might sometimes take 100+ms . Despite our efforts to debug the issue, we have been unable to identify the cause. Athough we are concerned if cloning the aws_sdk_dynamodb::client for each upload task is causing any deadlock situation in the library code. would be great to get some pointers here for us to work on.

Expected Behavior

Latency to be fairly consitent and but increasing depending on load .

Current Behavior

Random spikes in latency even for smaller loads.

Reproduction Steps

latency graph from cloud watch: image

code snippet for uploader task: https://gist.github.com/nuel77/a48fee5172abdf47efeba14bdaaff3b7

also, our client looks like this:

 pub async fn new(tables: TableConfig, config: &SdkConfig) -> Self {
        let dynamo_db_client = Client::new(config);
        Self {
            client: dynamo_db_client,
            balances_table: tables.balances_table,
            orders_table: tables.orders_table,
            trades_table: tables.trades_table,
            accounts_table: tables.accounts_table,
            transactions_table: tables.transactions_table,
            general_table: tables.general_table,
        }
    }

Possible Solution

No response

Additional Information/Context

No response

Version

(base) ~/CLionProjects/orderbook git:[add-withdrawal-tests]
cargo tree | grep aws-
├── aws-config v1.4.0
│   ├── aws-credential-types v1.2.0
│   │   ├── aws-smithy-async v1.2.1
│   │   ├── aws-smithy-runtime-api v1.6.1
│   │   │   ├── aws-smithy-async v1.2.1 (*)
│   │   │   ├── aws-smithy-types v1.1.10
│   │   ├── aws-smithy-types v1.1.10 (*)
│   ├── aws-runtime v1.2.2
│   │   ├── aws-credential-types v1.2.0 (*)
│   │   ├── aws-sigv4 v1.2.1
│   │   │   ├── aws-credential-types v1.2.0 (*)
│   │   │   ├── aws-smithy-eventstream v0.60.4
│   │   │   │   ├── aws-smithy-types v1.1.10 (*)
│   │   │   ├── aws-smithy-http v0.60.8
│   │   │   │   ├── aws-smithy-eventstream v0.60.4 (*)
│   │   │   │   ├── aws-smithy-runtime-api v1.6.1 (*)
│   │   │   │   ├── aws-smithy-types v1.1.10 (*)
│   │   │   ├── aws-smithy-runtime-api v1.6.1 (*)
│   │   │   ├── aws-smithy-types v1.1.10 (*)
│   │   ├── aws-smithy-async v1.2.1 (*)
│   │   ├── aws-smithy-eventstream v0.60.4 (*)
│   │   ├── aws-smithy-http v0.60.8 (*)
│   │   ├── aws-smithy-runtime-api v1.6.1 (*)
│   │   ├── aws-smithy-types v1.1.10 (*)
│   │   ├── aws-types v1.2.1
│   │   │   ├── aws-credential-types v1.2.0 (*)
│   │   │   ├── aws-smithy-async v1.2.1 (*)
│   │   │   ├── aws-smithy-runtime-api v1.6.1 (*)
│   │   │   ├── aws-smithy-types v1.1.10 (*)
│   ├── aws-sdk-sso v1.25.0
│   │   ├── aws-credential-types v1.2.0 (*)
│   │   ├── aws-runtime v1.2.2 (*)
│   │   ├── aws-smithy-async v1.2.1 (*)
│   │   ├── aws-smithy-http v0.60.8 (*)
│   │   ├── aws-smithy-json v0.60.7
│   │   │   └── aws-smithy-types v1.1.10 (*)
│   │   ├── aws-smithy-runtime v1.5.3
│   │   │   ├── aws-smithy-async v1.2.1 (*)
│   │   │   ├── aws-smithy-http v0.60.8 (*)
│   │   │   ├── aws-smithy-runtime-api v1.6.1 (*)
│   │   │   ├── aws-smithy-types v1.1.10 (*)
│   │   ├── aws-smithy-runtime-api v1.6.1 (*)
│   │   ├── aws-smithy-types v1.1.10 (*)
│   │   ├── aws-types v1.2.1 (*)
│   ├── aws-sdk-ssooidc v1.26.0
│   │   ├── aws-credential-types v1.2.0 (*)
│   │   ├── aws-runtime v1.2.2 (*)
│   │   ├── aws-smithy-async v1.2.1 (*)
│   │   ├── aws-smithy-http v0.60.8 (*)
│   │   ├── aws-smithy-json v0.60.7 (*)
│   │   ├── aws-smithy-runtime v1.5.3 (*)
│   │   ├── aws-smithy-runtime-api v1.6.1 (*)
│   │   ├── aws-smithy-types v1.1.10 (*)
│   │   ├── aws-types v1.2.1 (*)
│   ├── aws-sdk-sts v1.25.0
│   │   ├── aws-credential-types v1.2.0 (*)
│   │   ├── aws-runtime v1.2.2 (*)
│   │   ├── aws-smithy-async v1.2.1 (*)
│   │   ├── aws-smithy-http v0.60.8 (*)
│   │   ├── aws-smithy-json v0.60.7 (*)
│   │   ├── aws-smithy-query v0.60.7
│   │   │   ├── aws-smithy-types v1.1.10 (*)
│   │   ├── aws-smithy-runtime v1.5.3 (*)
│   │   ├── aws-smithy-runtime-api v1.6.1 (*)
│   │   ├── aws-smithy-types v1.1.10 (*)
│   │   ├── aws-smithy-xml v0.60.8
│   │   ├── aws-types v1.2.1 (*)
│   ├── aws-smithy-async v1.2.1 (*)
│   ├── aws-smithy-http v0.60.8 (*)
│   ├── aws-smithy-json v0.60.7 (*)
│   ├── aws-smithy-runtime v1.5.3 (*)
│   ├── aws-smithy-runtime-api v1.6.1 (*)
│   ├── aws-smithy-types v1.1.10 (*)
│   ├── aws-types v1.2.1 (*)
├── aws-sdk-dynamodb v1.28.0
│   ├── aws-credential-types v1.2.0 (*)
│   ├── aws-runtime v1.2.2 (*)
│   ├── aws-smithy-async v1.2.1 (*)
│   ├── aws-smithy-http v0.60.8 (*)
│   ├── aws-smithy-json v0.60.7 (*)
│   ├── aws-smithy-runtime v1.5.3 (*)
│   ├── aws-smithy-runtime-api v1.6.1 (*)
│   ├── aws-smithy-types v1.1.10 (*)
│   ├── aws-types v1.2.1 (*)
│   ├── aws-config v1.4.0 (*)
│   ├── aws-sdk-dynamodb v1.28.0 (*)
├── aws-config v1.4.0 (*)
├── aws-sdk-dynamodb v1.28.0 (*)
│   ├── aws-creds v0.36.0
│   ├── aws-region v0.25.4
├── aws-config v1.4.0 (*)
│   ├── aws-config v1.4.0 (*)
│   ├── aws-sdk-s3 v1.29.0
│   │   ├── aws-credential-types v1.2.0 (*)
│   │   ├── aws-runtime v1.2.2 (*)
│   │   ├── aws-sigv4 v1.2.1 (*)
│   │   ├── aws-smithy-async v1.2.1 (*)
│   │   ├── aws-smithy-checksums v0.60.8
│   │   │   ├── aws-smithy-http v0.60.8 (*)
│   │   │   ├── aws-smithy-types v1.1.10 (*)
│   │   ├── aws-smithy-eventstream v0.60.4 (*)
│   │   ├── aws-smithy-http v0.60.8 (*)
│   │   ├── aws-smithy-json v0.60.7 (*)
│   │   ├── aws-smithy-runtime v1.5.3 (*)
│   │   ├── aws-smithy-runtime-api v1.6.1 (*)
│   │   ├── aws-smithy-types v1.1.10 (*)
│   │   ├── aws-smithy-xml v0.60.8 (*)
│   │   ├── aws-types v1.2.1 (*)
├── aws-config v1.4.0 (*)
├── aws-sdk-kms v1.26.0
│   ├── aws-credential-types v1.2.0 (*)
│   ├── aws-runtime v1.2.2 (*)
│   ├── aws-smithy-async v1.2.1 (*)
│   ├── aws-smithy-http v0.60.8 (*)
│   ├── aws-smithy-json v0.60.7 (*)
│   ├── aws-smithy-runtime v1.5.3 (*)
│   ├── aws-smithy-runtime-api v1.6.1 (*)
│   ├── aws-smithy-types v1.1.10 (*)
│   ├── aws-types v1.2.1 (*)
├── aws-config v1.4.0 (*)
├── aws-sdk-dynamodb v1.28.0 (*)
├── aws-config v1.4.0 (*)
├── aws-sdk-kms v1.26.0 (*)
├── aws-sdk-s3 v1.29.0 (*)
│   ├── aws-config v1.4.0 (*)
│   ├── aws-sdk-timestreamquery v1.26.0
│   │   ├── aws-credential-types v1.2.0 (*)
│   │   ├── aws-runtime v1.2.2 (*)
│   │   ├── aws-smithy-async v1.2.1 (*)
│   │   ├── aws-smithy-http v0.60.8 (*)
│   │   ├── aws-smithy-json v0.60.7 (*)
│   │   ├── aws-smithy-runtime v1.5.3 (*)
│   │   ├── aws-smithy-runtime-api v1.6.1 (*)
│   │   ├── aws-smithy-types v1.1.10 (*)
│   │   ├── aws-types v1.2.1 (*)
│   ├── aws-sdk-timestreamwrite v1.25.0
│   │   ├── aws-credential-types v1.2.0 (*)
│   │   ├── aws-runtime v1.2.2 (*)
│   │   ├── aws-smithy-async v1.2.1 (*)
│   │   ├── aws-smithy-http v0.60.8 (*)
│   │   ├── aws-smithy-json v0.60.7 (*)
│   │   ├── aws-smithy-runtime v1.5.3 (*)
│   │   ├── aws-smithy-runtime-api v1.6.1 (*)
│   │   ├── aws-smithy-types v1.1.10 (*)
│   │   ├── aws-types v1.2.1 (*)
├── aws-config v1.4.0 (*)
├── aws-sdk-config v1.26.0
│   ├── aws-credential-types v1.2.0 (*)
│   ├── aws-runtime v1.2.2 (*)
│   ├── aws-smithy-async v1.2.1 (*)
│   ├── aws-smithy-http v0.60.8 (*)
│   ├── aws-smithy-json v0.60.7 (*)
│   ├── aws-smithy-runtime v1.5.3 (*)
│   ├── aws-smithy-runtime-api v1.6.1 (*)
│   ├── aws-smithy-types v1.1.10 (*)
│   ├── aws-types v1.2.1 (*)
├── aws-config v1.4.0 (*)
├── aws-sdk-kms v1.26.0 (*)
├── aws-config v1.4.0 (*)
├── aws-sdk-config v1.26.0 (*)
├── aws-sdk-dynamodb v1.28.0 (*)
├── aws-config v1.4.0 (*)
├── aws-sdk-config v1.26.0 (*)
├── aws-config v1.4.0 (*)
├── aws-sdk-cloudwatch v1.29.0
│   ├── aws-credential-types v1.2.0 (*)
│   ├── aws-runtime v1.2.2 (*)
│   ├── aws-smithy-async v1.2.1 (*)
│   ├── aws-smithy-http v0.60.8 (*)
│   ├── aws-smithy-json v0.60.7 (*)
│   ├── aws-smithy-query v0.60.7 (*)
│   ├── aws-smithy-runtime v1.5.3 (*)
│   ├── aws-smithy-runtime-api v1.6.1 (*)
│   ├── aws-smithy-types v1.1.10 (*)
│   ├── aws-smithy-xml v0.60.8 (*)
│   ├── aws-types v1.2.1 (*)
├── aws-config v1.4.0 (*)
├── aws-sdk-dynamodb v1.28.0 (*)
├── aws-config v1.4.0 (*)
├── aws-sdk-config v1.26.0 (*)
├── aws-config v1.4.0 (*)
├── aws-sdk-dynamodb v1.28.0 (*)

Environment details (OS name and version, etc.)

ubunutu

Logs

No response

Gauthamastro commented 1 month ago

image

The above graph shows the uploader task in production. It has many other components but as you can see after restarting the process, the latency improved. The latency was jittery and very high after few days of running that process. We are pretty sure the amount of data we are transmitting doesn't need 15s to upload on a 25 Gbps server. Hope this gives some additional data to debug.

Also, if you look at the right end of the graph, you can still see some random spikes after restart.

NOTE: Running average of latency is around 100-200ms. The graph is showing the maximum recorded latencies

ysaito1001 commented 1 month ago

Thank you for providing us with a detailed description of your observations. As there may be various factors involved, we might be shooting in the dark but let us bring pieces of information we have to ensure we're aligned.

we are concerned if cloning the aws_sdk_dynamodb::client for each upload task is causing any deadlock situation in the library code

Hmm, it's hard to imagine simply cloning an ec2 Client could cause a deadlock since it just involves bumping up a reference count of Arc 🤔

Do you turn on tracing_subscriber in your application code to see if there are any logs around the time a latency spike occurred, potentially showing any log outputs that you usually don't see when things are working?

nuel77 commented 1 month ago

Will we check this add get back to you.

Velfi commented 1 month ago

Additionally, tokio-console can be really helpful for debugging issues with blocking tasks because it can show you the idle vs. active time for each task.

nuel77 commented 1 month ago

Thanks for the suggestions @Velfi , we are adding this to our tests, will get back with our findings in a few days.