aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.19k stars 845 forks source link

First DynamoDB request is very slow #1340

Closed stuartleylandcole closed 5 years ago

stuartleylandcole commented 5 years ago

Expected Behavior

All DynamoDB queries should take in the low 10s of milliseconds.

Current Behavior

DynamoDB queries are generally in the 10s of milliseconds however the first time a Dynamo function is executed I am seeing response times around 2s, subsequent queries then fall back to the expected levels.

Possible Solution

Unsure, I haven't been able to identify anything from stepping through the code

Steps to Reproduce (for bugs)

I have written the following code to replicate the creation of the DynamoDbClient and the execution of four put requests. It assumes you have a table named "item" in your AWS account.

The results are below.

import java.util.UUID;

import org.junit.jupiter.api.Test;

import com.google.common.collect.ImmutableMap;
import com.ixxus.cnc.utils.SimpleTimer;

import software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider;
import software.amazon.awssdk.http.SdkHttpClient;
import software.amazon.awssdk.http.urlconnection.UrlConnectionHttpClient;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.dynamodb.DynamoDbClient;
import software.amazon.awssdk.services.dynamodb.DynamoDbClientBuilder;
import software.amazon.awssdk.services.dynamodb.model.AttributeValue;
import software.amazon.awssdk.services.dynamodb.model.PutItemRequest;

class DynamoClientCT {

    private static final SimpleTimer TIMER = new SimpleTimer();

    private static final String CONTEXT_ALL = "CONTEXT_ALL";
    private static final String CONTEXT_HTTP_CLIENT_BUILD = "CONTEXT_HTTP_CLIENT_BUILD";
    private static final String CONTEXT_CREDENTIALS_PROVIDER_BUILD = "CONTEXT_CREDENTIALS_PROVIDER_BUILD";
    private static final String CONTEXT_DYNAMO_CLIENT_BUILD = "CONTEXT_DYNAMO_CLIENT_BUILD";
    private static final String CONTEXT_DYNAMO_CLIENT_EXECUTE = "CONTEXT_DYNAMO_CLIENT_EXECUTE_";

    @Test
    void test() {
        TIMER.start(CONTEXT_ALL);

        createClientAndExecuteRequest();

        TIMER.stop(CONTEXT_ALL);

        TIMER.logAllSummaries("DynamoClientCT");
    }

    private void createClientAndExecuteRequest() {
        TIMER.start(CONTEXT_HTTP_CLIENT_BUILD);
        final SdkHttpClient httpClient = UrlConnectionHttpClient.builder().build();
        TIMER.stop(CONTEXT_HTTP_CLIENT_BUILD);

        TIMER.start(CONTEXT_CREDENTIALS_PROVIDER_BUILD);
        final DefaultCredentialsProvider credentialsProvider = DefaultCredentialsProvider.create();
        TIMER.stop(CONTEXT_CREDENTIALS_PROVIDER_BUILD);

        TIMER.start(CONTEXT_DYNAMO_CLIENT_BUILD);
        final DynamoDbClientBuilder builder = DynamoDbClient.builder();
        builder.httpClient(httpClient);
        builder.region(Region.EU_WEST_1);
        builder.credentialsProvider(credentialsProvider);
        final DynamoDbClient client = builder.build();
        TIMER.stop(CONTEXT_DYNAMO_CLIENT_BUILD);

        putItem(client, 1);
        putItem(client, 2);
        putItem(client, 3);
        putItem(client, 4);
    }

    private void putItem(final DynamoDbClient client, final int count) {
        TIMER.start(CONTEXT_DYNAMO_CLIENT_EXECUTE + count);
        final PutItemRequest request = PutItemRequest.builder()
                .tableName("item")
                .item(ImmutableMap.of("id", AttributeValue.builder().s(UUID.randomUUID().toString()).build()))
                .build();

        client.putItem(request);
        TIMER.stop(CONTEXT_DYNAMO_CLIENT_EXECUTE + count);
    }
}

The following is the code for the SimpleTimer:

import java.util.ArrayList;
import java.util.DoubleSummaryStatistics;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class SimpleTimer {

    private static final Logger LOGGER = LoggerFactory.getLogger(SimpleTimer.class);

    Map<String, List<Long>> times = new TreeMap<String, List<Long>>();

    Map<String, Long> currentTime = new HashMap<>();

    public void start(final String context) {
        currentTime.put(context, System.currentTimeMillis());
    }

    public void stop(final String context) {
        if (!currentTime.containsKey(context)) {
            throw new IllegalStateException("Start has not been called for context: " + context);
        }
        times.computeIfAbsent(context, s -> new ArrayList<Long>())
                .add(System.currentTimeMillis() - currentTime.get(context));
    }

    public DoubleSummaryStatistics getSummaryStatistics(final String context) {
        return times.getOrDefault(context, new ArrayList<Long>())
                .stream()
                .mapToDouble(a -> a)
                .summaryStatistics();
    }

    public void logAllSummaries(final String prefix) {
        LOGGER.debug("logAllSummaries: {}");
        times.keySet().forEach(ctx -> LOGGER.info("{} - {} - {}", prefix, ctx, getSummaryStatistics(ctx)));
    }
}

Results:

2019-07-11 15:14:17.746  INFO  SimpleTimer:49 - DynamoClientCT - CONTEXT_ALL - DoubleSummaryStatistics{count=1, sum=1633.000000, min=1633.000000, average=1633.000000, max=1633.000000}
2019-07-11 15:14:17.747  INFO  SimpleTimer:49 - DynamoClientCT - CONTEXT_CREDENTIALS_PROVIDER_BUILD - DoubleSummaryStatistics{count=1, sum=4.000000, min=4.000000, average=4.000000, max=4.000000}
2019-07-11 15:14:17.748  INFO  SimpleTimer:49 - DynamoClientCT - CONTEXT_DYNAMO_CLIENT_BUILD - DoubleSummaryStatistics{count=1, sum=594.000000, min=594.000000, average=594.000000, max=594.000000}
2019-07-11 15:14:17.748  INFO  SimpleTimer:49 - DynamoClientCT - CONTEXT_DYNAMO_CLIENT_EXECUTE_1 - DoubleSummaryStatistics{count=1, sum=884.000000, min=884.000000, average=884.000000, max=884.000000}
2019-07-11 15:14:17.748  INFO  SimpleTimer:49 - DynamoClientCT - CONTEXT_DYNAMO_CLIENT_EXECUTE_2 - DoubleSummaryStatistics{count=1, sum=29.000000, min=29.000000, average=29.000000, max=29.000000}
2019-07-11 15:14:17.749  INFO  SimpleTimer:49 - DynamoClientCT - CONTEXT_DYNAMO_CLIENT_EXECUTE_3 - DoubleSummaryStatistics{count=1, sum=70.000000, min=70.000000, average=70.000000, max=70.000000}
2019-07-11 15:14:17.749  INFO  SimpleTimer:49 - DynamoClientCT - CONTEXT_DYNAMO_CLIENT_EXECUTE_4 - DoubleSummaryStatistics{count=1, sum=29.000000, min=29.000000, average=29.000000, max=29.000000}
2019-07-11 15:14:17.750  INFO  SimpleTimer:49 - DynamoClientCT - CONTEXT_HTTP_CLIENT_BUILD - DoubleSummaryStatistics{count=1, sum=23.000000, min=23.000000, average=23.000000, max=23.000000}

As you can see the first put takes 884ms then the three subsequent ones take 29, 70, and 29ms making the first request at least an order of magnitude slower. Add in the 600ms for the DynamoDbClient creation and that's quite a long time.

Context

This code is used in a Lambda function serving a REST API which is one of the canonical use cases for Lambdas. The expectation is that the function should be quick to respond and as shown by the subsequent requests, this is a reasonable expectation. The results above are actually far better than we are seeing when deployed to Lambda - our first Dynamo execution is consistently around 2s +/-0.2s. We are not using a VPC and the Lambdas have 1024MB memory.

Your Environment

The above code was run on my machine to get the provided results but we are deploying virtually identical code to an AWS Lambda with 1024MB memory.

I have some questions:

  1. Are there other settings or configuration we could be providing to reduce the time to build the client and execute the first request?
  2. What causes the first request to be so much slower than subsequent requests?
dagnir commented 5 years ago

Hi @stuartleylandcole The initial request being slow is because the HTTP client must open a new socket and perform an SSL handshake which are slow and costly operations.

dagnir commented 5 years ago

For your first question, it looks like you already use the URLConnectionHTTPClient which has the quickest startup time of the 3 existing HTTP clients we have.

dagnir commented 5 years ago

You can also have a look at this comment https://github.com/aws/aws-sdk-java-v2/issues/6#issuecomment-422126377 but it looks like you more or less follow the guidelines already.

stuartleylandcole commented 5 years ago

Hi @dagnir, thank you for your replies! I've been doing some further investigation and wanted to report back my findings and ask for some further assistance.

With a 1024MB Lambda the first DynamoDB request reliably takes 2s from executing the request to receiving a response. Note that this does not include time to build DynamoDbClient nor any of its dependencies (SdkHttpClient, etc). Building DynamoDbClient itself takes 400ms. Here is a log showing the client build and request. Note that I have used the ApacheHttpClient instead of UrlConnectionClient here so I could enable trace logging - using this client has not affected the times taken to make the DynamoDB request, they are inline with my original post, although I concede it probably takes longer to initialise so I will revert back when deploying for real.

2019-07-15 11:04:41.030 DEBUG CommonModule:26 - Starting to build DynamoDB client
2019-07-15 11:04:41.473 DEBUG CommonModule:28 - Finished building DynamoDB client
...
2019-07-15 11:04:42.172 4762e9b6-bc2a-4bfb-9d62-3332c459314a DEBUG ItemFetcher:50 - Retrieving item with key {id=AttributeValue(S=5fb7d27c-0447-411c-9c73-bbbbcb816cfe, SS=software.amazon.awssdk.core.util.DefaultSdkAutoConstructList@2cbb3d47, NS=software.amazon.awssdk.core.util.DefaultSdkAutoConstructList@2cbb3d47, BS=software.amazon.awssdk.core.util.DefaultSdkAutoConstructList@2cbb3d47, M=software
2019-07-15 11:04:42.318 4762e9b6-bc2a-4bfb-9d62-3332c459314a DEBUG AwsCredentialsProviderChain:95 - Unable to load credentials from SystemPropertyCredentialsProvider(): Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId). software.amazon.awssdk.core.exception.SdkClientException: Unabl
2019-07-15 11:04:42.333 4762e9b6-bc2a-4bfb-9d62-3332c459314a DEBUG AwsCredentialsProviderChain:84 - Loading credentials from EnvironmentVariableCredentialsProvider()
2019-07-15 11:04:42.351 4762e9b6-bc2a-4bfb-9d62-3332c459314a DEBUG ExecutionInterceptorChain:84 - Creating an interceptor chain that will apply interceptors in the following order: []
2019-07-15 11:04:42.556 4762e9b6-bc2a-4bfb-9d62-3332c459314a DEBUG request:84 - Sending Request: DefaultSdkHttpFullRequest(httpMethod=POST, protocol=https, host=dynamodb.eu-west-1.amazonaws.com, encodedPath=/, headers=[amz-sdk-invocation-id, Content-Length, Content-Type, User-Agent, X-Amz-Target], queryParameters=[])
2019-07-15 11:04:42.613 4762e9b6-bc2a-4bfb-9d62-3332c459314a TRACE Aws4Signer:126 - AWS4 Canonical Request: POST / amz-sdk-invocation-id:80f99976-67fc-b5d6-37fc-cacdb75683cb amz-sdk-retry:0/0/500 content-length:91 content-type:application/x-amz-json-1.0 host:dynamodb.eu-west-1.amazonaws.com x-amz-date:20190715T110442Z x-amz-security-token:AgoJb3JpZ2luX2VjEKv//////////wEaCWV1LXdlc3QtMSJHMEUCIQCvkA
2019-07-15 11:04:42.631 4762e9b6-bc2a-4bfb-9d62-3332c459314a DEBUG Aws4Signer:84 - AWS4 String to sign: AWS4-HMAC-SHA256 20190715T110442Z 20190715/eu-west-1/dynamodb/aws4_request 905a861c122219a97b2b584d0c7e9e1e6e1257fd99460c5f014ee99b2df67b8c
2019-07-15 11:04:42.634 4762e9b6-bc2a-4bfb-9d62-3332c459314a TRACE Aws4Signer:126 - Generating a new signing key as the signing key not available in the cache for the date: 1563148800000
2019-07-15 11:04:42.992 4762e9b6-bc2a-4bfb-9d62-3332c459314a TRACE SdkTlsSocketFactory:126 - Connecting to dynamodb.eu-west-1.amazonaws.com/52.119.244.156:443
2019-07-15 11:04:43.194 4762e9b6-bc2a-4bfb-9d62-3332c459314a DEBUG SdkTlsSocketFactory:84 - socket.getSupportedProtocols(): [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2], socket.getEnabledProtocols(): [TLSv1, TLSv1.1, TLSv1.2]
2019-07-15 11:04:43.211 4762e9b6-bc2a-4bfb-9d62-3332c459314a DEBUG SdkTlsSocketFactory:84 - TLS protocol enabled for SSL handshake: [TLSv1.2, TLSv1.1, TLSv1]
2019-07-15 11:04:43.950 4762e9b6-bc2a-4bfb-9d62-3332c459314a DEBUG SdkSslSocket:84 - created: dynamodb.eu-west-1.amazonaws.com/52.119.244.156:443
2019-07-15 11:04:44.094 4762e9b6-bc2a-4bfb-9d62-3332c459314a DEBUG request:84 - Received successful response: 200

I have also run the same code on a 2048MB Lambda to compare times and they are roughly half the time for the 1024MB Lambda, except for building DynamoDbClient - that still takes 400ms. Here is the log:

2019-07-15 11:55:19.645 DEBUG CommonModule:26 - Starting to build DynamoDB client
2019-07-15 11:55:20.063 DEBUG CommonModule:28 - Finished building DynamoDB client
...
2019-07-15 11:55:20.577 e3cf38b2-24d5-4d2b-bbef-1a81cf0046ba DEBUG ItemFetcher:50 - Retrieving item with key {id=AttributeValue(S=e97498c4-4bbf-42d7-97c9-86cbe98575e3, SS=software.amazon.awssdk.core.util.DefaultSdkAutoConstructList@2cbb3d47, NS=software.amazon.awssdk.core.util.DefaultSdkAutoConstructList@2cbb3d47, BS=software.amazon.awssdk.core.util.DefaultSdkAutoConstructList@2cbb3d47, M=software
2019-07-15 11:55:20.663 e3cf38b2-24d5-4d2b-bbef-1a81cf0046ba DEBUG AwsCredentialsProviderChain:95 - Unable to load credentials from SystemPropertyCredentialsProvider(): Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId). software.amazon.awssdk.core.exception.SdkClientException: Unabl
2019-07-15 11:55:20.669 e3cf38b2-24d5-4d2b-bbef-1a81cf0046ba DEBUG AwsCredentialsProviderChain:84 - Loading credentials from EnvironmentVariableCredentialsProvider()
2019-07-15 11:55:20.674 e3cf38b2-24d5-4d2b-bbef-1a81cf0046ba DEBUG ExecutionInterceptorChain:84 - Creating an interceptor chain that will apply interceptors in the following order: []
2019-07-15 11:55:20.763 e3cf38b2-24d5-4d2b-bbef-1a81cf0046ba DEBUG request:84 - Sending Request: DefaultSdkHttpFullRequest(httpMethod=POST, protocol=https, host=dynamodb.eu-west-1.amazonaws.com, encodedPath=/, headers=[amz-sdk-invocation-id, Content-Length, Content-Type, User-Agent, X-Amz-Target], queryParameters=[])
2019-07-15 11:55:20.791 e3cf38b2-24d5-4d2b-bbef-1a81cf0046ba TRACE Aws4Signer:126 - AWS4 Canonical Request: POST / amz-sdk-invocation-id:3772b0c5-757d-ce7f-e32f-77c5df238715 amz-sdk-retry:0/0/500 content-length:91 content-type:application/x-amz-json-1.0 host:dynamodb.eu-west-1.amazonaws.com x-amz-date:20190715T115520Z x-amz-security-token:AgoJb3JpZ2luX2VjEKv//////////wEaCWV1LXdlc3QtMSJHMEUCIQCvkA
2019-07-15 11:55:20.795 e3cf38b2-24d5-4d2b-bbef-1a81cf0046ba DEBUG Aws4Signer:84 - AWS4 String to sign: AWS4-HMAC-SHA256 20190715T115520Z 20190715/eu-west-1/dynamodb/aws4_request d0243cc15e289b4d98de1e7c3ffcd503b7a23e226957337e0d6145e2d291bc06
2019-07-15 11:55:20.798 e3cf38b2-24d5-4d2b-bbef-1a81cf0046ba TRACE Aws4Signer:126 - Generating a new signing key as the signing key not available in the cache for the date: 1563148800000
2019-07-15 11:55:20.964 e3cf38b2-24d5-4d2b-bbef-1a81cf0046ba TRACE SdkTlsSocketFactory:126 - Connecting to dynamodb.eu-west-1.amazonaws.com/52.94.25.56:443
2019-07-15 11:55:21.037 e3cf38b2-24d5-4d2b-bbef-1a81cf0046ba DEBUG SdkTlsSocketFactory:84 - socket.getSupportedProtocols(): [SSLv2Hello, SSLv3, TLSv1, TLSv1.1, TLSv1.2], socket.getEnabledProtocols(): [TLSv1, TLSv1.1, TLSv1.2]
2019-07-15 11:55:21.039 e3cf38b2-24d5-4d2b-bbef-1a81cf0046ba DEBUG SdkTlsSocketFactory:84 - TLS protocol enabled for SSL handshake: [TLSv1.2, TLSv1.1, TLSv1]
2019-07-15 11:55:21.394 e3cf38b2-24d5-4d2b-bbef-1a81cf0046ba DEBUG SdkSslSocket:84 - created: dynamodb.eu-west-1.amazonaws.com/52.94.25.56:443
2019-07-15 11:55:21.433 e3cf38b2-24d5-4d2b-bbef-1a81cf0046ba DEBUG request:84 - Received successful response: 200

A quick summary (all times in ms):

Step 1024MB 2048MB
DynamoDbClient build 440 420
Unknown 150 86
Creating interceptor chain 200 90
Generating new signing key 360 170
Establishing SSL connection 960 430

Even with the timings on the 2048MB sized Lambda this makes it quite difficult to run a REST API using Lambdas as a cold start will incur both the normal Lambda cold start time (1-2s) plus the DynamoDB overhead (1s). Equally these Lambdas do not require anything like 2GB of memory, they are using less than 200MB but I realise that CPU scales with memory which is why I performed this test - ideally I would be using 512MB or less.

Some questions to finish with:

  1. Having seen the trace logging and timing is there anything you can recommend that would speed up the first request execution?
  2. What is happening during the DynamoDbClient creation that takes 400ms? Why would this be so consistent across the two different Lambdas when other timings are roughly proportional
  3. What is happening at the 'Unknown' step in the table above - this is between the following two lines of logging, the first of which is my code calling DynamoDbClient#getItem:
    2019-07-15 11:04:42.172 4762e9b6-bc2a-4bfb-9d62-3332c459314a DEBUG ItemFetcher:50 - Retrieving item with key {id=AttributeValue(S=5fb7d27c-0447-411c-9c73-bbbbcb816cfe, SS=software.amazon.awssdk.core.util.DefaultSdkAutoConstructList@2cbb3d47, NS=software.amazon.awssdk.core.util.DefaultSdkAutoConstructList@2cbb3d47, BS=software.amazon.awssdk.core.util.DefaultSdkAutoConstructList@2cbb3d47, M=software
    2019-07-15 11:04:42.318 4762e9b6-bc2a-4bfb-9d62-3332c459314a DEBUG AwsCredentialsProviderChain:95 - Unable to load credentials from SystemPropertyCredentialsProvider(): Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId). software.amazon.awssdk.core.exception.SdkClientException: Unabl

Thank you for any help you can provide. Hopefully we are able to reduce these times as at the moment I'm not sure that Lambdas providing a REST API in front of DynamoDB is a viable architecture.

dagnir commented 5 years ago

Hi @stuartleylandcole, the constant of ~400ms for instantiating DynamoDb is mostly due to classloading and static initialization. Most of these operations are I/O bound which is probably why it doesn't really scale with larger Lambdas. We can look into removing some unnecessary/heavy static initialization which may shave off another 100 or so ms.

The 'Unknown' step in the logs appears the the credentials loading mechanism. This is in line with what's happening since this is happening after a call to DDB is made, wherer the SDK must then resolve the credentials to use for signing. You can reduce this by providing a specific credentials provider, rather that DefaultCredentialsProvider, which delegates to a set of other providers. For example, since you're in Lambda, you can use the EnvironmentVariableCredentialsProvider which will pull credentials from the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN which are exposed to your Lambda.

debora-ito commented 5 years ago

Hi @stuartleylandcole, has @dagnir answered your questions? Do you have any other question?

stuartleylandcole commented 5 years ago

Hi @debora-ito @dagnir. Thank you for your reply and apologies for not responding sooner.

I think my question has been answered, at least in relation to the SDK - I have others concerns about the viability of Lambdas for a REST API backed by DynamoDB given the timings shown above but I don't think this is the place to raise them. Do you happen to know the correct place to raise this?

Many thanks for your help! I'm happy for this issue to be closed now.

varunnvs92 commented 5 years ago

I would suggest reaching out to the AWS Support or directly to service team via AWS Forums.

ocind commented 5 years ago

hi @stuartleylandcole , thank you so much for a very detailed explanation and benchmark. I am actually having the same issue, and your investigation results above is really helpful. I would like to know if you have found some other way to shave off some more time in regards to the first request to DynamoDb. Have you gained any updates from Support or other forums?

I think it is unacceptable that a 1024MB lambda takes 2s to execute first DynamoDb request. This is actually worse than AWS Lambda cold start itself :(

stuartleylandcole commented 5 years ago

Hi @ocind. I'm quite surprised more people haven't encountered the problem, it fundamentally undermines one of the main reference architectures for AWS Lambda IMO.

The only way we have found to reduce the time is by warming the Lambdas. However the normal Lambda warming (for example, using, serverless-plugin-warmup) isn't enough; you need to make a Dynamo request so that the connection is established to do the SSL negotiation, setup connection pooling, etc. We use DynamoDbClient#describeEndpoints for this as it is a harmless operation. Note that you'll need to do something similar for this for every AWS service you use from Lambda. For example we also use Cognito and call CognitoIdentityProviderClient#listUserPools as part of our warming process.

ocind commented 5 years ago

hi @stuartleylandcole I think my use case is very similar to yours. We also use Dynamo and Cognito. On lambda warmup calls, currently we also initialize all the sdk clients and try to make a dummy request for each one.

In one of our Lambda functions, we actually have 2 Dynamo tables to read values from. I did some experiment couple months ago, and we found that while writing to different tables, first request to each table would involve setting up SSL handshake again. Currently we warm the connections by performing describe-table request for each table.

We use DynamoDbClient#describeEndpoints for this as it is a harmless operation

Hmmm, I didn't know that describeEndpoints works in this case, I will check it out.

Thank you for the insights

biletnikov commented 5 years ago

We have the same issue. Java 8 & DynamoDB

Containers: 512 Mb takes 9 seconds for the first request (cold start is just 400 ms), next request 80 ms. 1024 Mb takes 10 seconds, next requests about 20 ms 2048 Mb takes 2 seconds, next requests about 12 ms

Logs say that the lambda uses not more 200 Mb of RAM. I think the issue concerns CPU resources. The quota for CPU depends on RAM size, more RAM more CPU cycles allowed for the container.

I suppose the DynamoDB client (from AWS SDK) is not efficient and should be improved with refactoring.

jsyrjala commented 4 years ago

I am seeing a similar situation with S3 clients inside a Lambda. The first creation of a client and first request are significantly slower than following requests (about 10 seconds vs. 200 ms) with 512M Lambda.

biletnikov commented 4 years ago

Just an assumption: maybe it happens when DynamoDBMapper is used, because it needs time and CPU to scan java classes and create corresponding mapper object in the memory. It seems it is very CPU intensive task, so it works more or less better with big memory lambda instances, where CPU quota is higher.

brianmcca1 commented 4 years ago

I also have encountered this issue - I tried a few things but found that increasing the memory size was the only thing that seemed to make an impact. My findings:

Lambda memory size (mb) First DynamoDB query duration (seconds) Total request time (seconds)
128 (min) Timeout (>30) Timeout (>30)
256 (initial) 9.0 12.9
512 4.2 7.3
832 2.3 5.0
1024 2.0 4.5
1280 1.7 4.2
1600 1.2 3.6
1856 1.1 3.3
2048 1.0 3.2
2560 0.86 2.8
3008 (max) 0.78 2.6

image

We are using the DynamoDBMapper, but I tried switching over to using QuerySpec instead and manually mapping the result to a POJO and it didn't make a noticeable performance difference. I'd love to know if anyone has found a more consistent solution, but this at least helped make the times more reasonable (we were at 256 MB to start and it was unusable on cold starts).

zoewangg commented 4 years ago

Hi all, we have released a maven archetype archetype-lambda that helps customers bootstrap a lambda function faster with SDK recommended practices to achieve minimal SDK startup time.

mvn archetype:generate \
  -DarchetypeGroupId=software.amazon.awssdk \
  -DarchetypeArtifactId=archetype-lambda \
  -DarchetypeVersion=2.11.4\

See https://github.com/aws/aws-sdk-java-v2/blob/master/archetypes/archetype-lambda/README.md#maven-archetype-for-lambda-function-using-aws-sdk-for-java-2x for more info

Please try it out and let us know if you have any feedback! :)

stuartleylandcole commented 4 years ago

@zoewangg many thanks for this! Could you highlight the key parts of the archetype in relation to this issue (i.e. Initialisation of the dynamo client)? I've taken a look at the code and the most obvious part is initialising the client in the Lambda handler constructor - are there other aspects to it? Thanks again!

zoewangg commented 4 years ago

@stuartleylandcole Sure! Below are the patterns we use in the archetype to reduce SDK startup time:

https://github.com/aws/aws-sdk-java-v2/blob/c72d2b554de92258b9cfced4f8f54f3ba714cf92/archetypes/archetype-lambda/src/test/resources/projects/urlhttpclient/reference/pom.xml#L37-L46


Update: please check out https://aws.amazon.com/blogs/developer/tuning-the-aws-java-sdk-2-x-to-reduce-startup-time/ for best practices to reduce SDK startup time

Nxtra commented 4 years ago

@ocind @stuartleylandcole . I'm thinking about the following setup to eliminate the problem:

I'm sure this eliminates the overhead just after the environment has been provisioned, cause I've seen that. I do not know what will happen if you provision the environment, and the first request only comes in an hour later. I suppose the connection might have been closed by the client or the backend dynamodb service. That means the overhead would be there again. That is if the init code is only run once for provisioned concurrency. I still have to test that last part to see if it actually behaves like that.

What do you think?

zoewangg commented 4 years ago

@Nxtra Yeah, you can consider pre-warmuping the client by making a simple GET api after you create the client to establish a connection in advance. By default, the SDK closes idle connections after 60 seconds. You can increase it, but keep in mind that idle connections can also be closed on the service side after certain amount of time, which cannot be configured on the SDK side.

https://github.com/aws/aws-sdk-java-v2/blob/c72d2b554de92258b9cfced4f8f54f3ba714cf92/http-client-spi/src/main/java/software/amazon/awssdk/http/SdkHttpConfigurationOption.java#L118

I'm not super familiar with provisioned concurrency feature of Lambda function, and I'd recommend reaching out to Lambda team on AWS forums if you have any questions regarding it.

biletnikov commented 4 years ago

For me, it looks like there is no any problems with dynamodb connections and the time to establish it (honestly I did not measure it).

I think the issue is in something different, because the first request time is very very varies on the Lambda Memory settings and as consequeuence the CPU quota Lambda has. If you have a big Lambda instance like 3 Gb, there is no any serious delay for the first request. Right ? Actually, any HTTP connection (which DynamoDB uses) must not require too much CPU resources, in case if you do not launch a super duper connection pool for the client.

Yes, using provisioned concurrency feature may be a kind of workaround, but it is not a solution, you can not keep many provisioned lambda instances because it costs money (I think using the 3 Gb Lambda will be more cheaper) . Also, in case of spike of requests you will get the issue again, when you need more lambda instances than you have provisioned.

I still think that there is an issue in DynamoDB client library for java, maybe it is not optimized properly or has a problem with a dependency libs which DynamoDB client uses.

On Tue, Mar 31, 2020 at 1:44 AM Zoe Wang notifications@github.com wrote:

@Nxtra https://github.com/Nxtra Yeah, you can consider pre-warmuping the client by making a simple GET api after you create the client to establish a connection in advance. By default, the SDK closes idle connections after 60 seconds. You can increase it, but keep in mind that idle connections can also be closed on the service side after certain amount of time, which cannot be configured on the SDK side.

https://github.com/aws/aws-sdk-java-v2/blob/c72d2b554de92258b9cfced4f8f54f3ba714cf92/http-client-spi/src/main/java/software/amazon/awssdk/http/SdkHttpConfigurationOption.java#L118

I'm not super familiar with provisioned concurrency feature of Lambda function, and I'd recommend reaching out to Lambda team on AWS forums if you have any questions regarding it.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/aws/aws-sdk-java-v2/issues/1340#issuecomment-606290217, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGYOKFGIGJEOD4Z67PYEX5LRKEODTANCNFSM4IBIJFLA .

-- Best regards, Sergei Biletnikov

zoewangg commented 4 years ago

Hi @biletnikov, from your previous comment, it seems you are using DynamoDBMapper from AWS Java SDK 1.11.x. We are aware of the slow startup issue in 1.11.x, and unfortunately there is not much we can do to improve it while still maintaining backwards compatibility.

We are actively working on the implementation of Dynamodb mapper 2.x, and it's currently in preview. https://github.com/aws/aws-sdk-java-v2/tree/master/services-custom/dynamodb-enhanced

biletnikov commented 4 years ago

Hi Zoe,

right, I am using DynamoDBMapper from AWS Java SDK 1.11.x Ok, good that you confirm the startup issue in this lib.

Looking forward for Dynamodb mapper 2.x release, so we could use it for production projects.

Could you roughly estimate when it will be ready? in 1 month, 3 months, 6 months ?

Thanks.

On Tue, Mar 31, 2020 at 8:37 PM Zoe Wang notifications@github.com wrote:

Hi @biletnikov https://github.com/biletnikov, from your previous comment, it seems you are using DynamoDBMapper from AWS Java SDK 1.11.x. We are aware of the slow startup issue in 1.11.x, and unfortunately there is not much we can do to improve it while still maintaining backwards compatibility.

We are actively working on the implementation of Dynamodb mapper 2.x, and it's currently in preview. https://github.com/aws/aws-sdk-java-v2/tree/master/services-custom/dynamodb-enhanced

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aws/aws-sdk-java-v2/issues/1340#issuecomment-606770863, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGYOKFBROXRYBBHOLBQLP5TRKIS37ANCNFSM4IBIJFLA .

-- Best regards, Sergei Biletnikov

Nxtra commented 4 years ago

Hi @biletnikov, from your previous comment, it seems you are using DynamoDBMapper from AWS Java SDK 1.11.x. We are aware of the slow startup issue in 1.11.x, and unfortunately there is not much we can do to improve it while still maintaining backwards compatibility.

We are actively working on the implementation of Dynamodb mapper 2.x, and it's currently in preview. https://github.com/aws/aws-sdk-java-v2/tree/master/services-custom/dynamodb-enhanced

Great to hear that! Can you give some insight in what is causing that behavior in 1.11.x ?

zoewangg commented 4 years ago

Hi @biletnikov, we don't have a public timeline at the moment, but DynamoDbMapper v2 is definitely one of our top priorities right now. The work is tracked in #35

Hi @Nxtra Sure! One of the main reasons is that 1.11.x SDK uses ApacheHttpClient under the hood and initializing it can be expensive. Checkout this comment https://github.com/aws/aws-sdk-java/issues/1774#issuecomment-426431217 for more info. Let me know if you have further questions!

talentprince commented 3 years ago

@zoewangg Hi, we've used enhanced since June 2020, and current using version is one of Mar, 2021, we noticed that slow start up is still existing and hard to optimized event if I did a pre dummy request for one of our DynamoDbTalbe<*>. Not sure if your team is still profiling the initial start up of new sdk, this really matters for some of the time sensitive request.

zoewangg commented 3 years ago

Hi @talentprince thank you for your feedback. Are you using static table schema? Can you share how you create the DynamodbEnhancedClient? What is your memory setting?

On a side note, if startup latency is critical to your application, you might consider GraalVM native image, which has faster startup time compared with JVM. https://aws.amazon.com/blogs/developer/graalvm-native-image-support-in-the-aws-sdk-for-java-2-x/

talentprince commented 3 years ago

Hi, @zoewangg , we are not using lambda but springboot+eks, memory is 600M, DynamodbEnhancedClient is a spring bean, and table schema is initialized as fields of our storage client (another bean). Due to our architecture, it's hard to ship to native image...

Nandlalaji commented 3 years ago

Hi @zoewangg, i am getting reestablishment of connection after every one min of idle time. I am using ApacheHttpClient for db connection. with all default value from SdkHttpConfigurationOption, i observed in log that the connection is getting shut down after every one min of idle time. so, i increased CONNECTION_MAX_IDLE_TIMEOUT to 10 min. But now the connection is not getting shut down by itself after one min of idle time but when another request is fired after 2-3 min of idle time. it is sending the request first then shutting the connection then reestablishing the connection, which infact take more latency then first default case. i tried with tcpKeepAlive setting to true but still same result. i also increased connection ttl to 10 min but still no change. Can i do something to not reestablishing the connection after some idle time. may be 1/2 hour or more. For now i am warming up DB every min. which doesn't seem good solution

BTW i am not using lambda, it is ECS to dynamodb

sl-omideidivandi commented 3 years ago

@Nandlalaji i think your problem is Connection Keep Alive, it's by default active for new release of SDK as i know but if you use the V1 or V2 you need to flag it as true when creating the http client

Nandlalaji commented 3 years ago

Thanks for replying @sl-omideidivandi. i have set tcpKeepAlive as true. But still same behaviour. i mean after 2-3 idle time if request is made, first the connection is shutting down then reestablishing the connection. Well this problem exist, but it is ok since reestablishing the connection is taking some 20-30 ms.

Now i noticed some different issue. i am firing request to dynamodb after every 5 mins of idle time. And i found 20% of request randomly goes more than 1s. which totally unacceptable. BTW i fired around 100 request.

To avoid this above issue i am warming up the DB by making dummy DB call every 1 min. But this solution seems not full proof. i mean now the 20% has decrease to 1% but still this issue exist i am afraid this issue is not related to property changes. Seems like some other issue.

I tried to investigate more by printing SDK log. and found logs similar to @stuartleylandcole. only that after DynamoDBEnhancedClient call it is going to ExecutionInterceptorChain -Creating an interceptor chain that will apply interceptors. i found whenever it is taking more than 1s it is taking in above two steps. Not sure if it is some sdk issue.

should i be creating a bug for this.

BTW i am doing all this from ECS. Lambda is not involve in my case.

ynwangn commented 11 months ago

In 2019 Re:Invent, Stefano Buliani gave a speech regarding how to optimize Lambda execution time. Here's the presentation video: https://youtu.be/ddg1u5HLwg8?si=gKOO1Fnqj0ggV2mN&t=1515. He mentioned the following points regarding AWS SDK for Java v2:

The AWS SDK for Java 2.0 uses Jackson’s high-level APIs to marshal and unmarshal errors and data. The marshallers are initialized lazily, and Jackson relies heavily on reflection to understand models Fortunately, Jackson caches all of the “discovered” fields so that the SDK introspects objects only once

And the solution is

We can force the AWS SDK to exercise its marshallers at initialization, by sending a dummy request.

I believe TLS handshake, socket setup definitely contributed to the long latency. But I think the lazy loading probably contributed more to the latency. Hope this helps.