Closed erodewald closed 3 years ago
Seeing some occasional Socket closed
errors is common, but seeing a spike of occurrences is not.
Can you share a sample code showing how you are creating the client?
I see the issues you are experiencing are in production but would it be possible to enable the client side metrics? https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/generating-sdk-metrics.html The Request Metrics will show number of connections, request time and other metrics that can help in toubleshooting.
Thanks for taking a look. Here's the client creation, trimmed to the relevant parts:
@Provides
@Singleton
public static AmazonDynamoDB dynamoDBClient(DynamoDBConfig config) {
AmazonDynamoDBClientBuilder builder = AmazonDynamoDBClientBuilder.standard()
.withClientConfiguration(getClientConfiguration(config));
builder.withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(config.getEndpointUrl(), config.getRegion()));
return builder.build();
}
private static ClientConfiguration getClientConfiguration(DynamoDBConfig config) {
ClientConfiguration clientConfiguration = new ClientConfigurationFactory().getConfig()
.withRetryPolicy(PredefinedRetryPolicies.DYNAMODB_DEFAULT);
setTimeouts(clientConfiguration, config.getTimeouts());
return clientConfiguration;
}
I can enable client metrics for this in production (it's one microservice among many—should be alright) but I'd like to know what SDK metrics I should look for in CloudWatch to make it easier to justify enabling it. It will take me some time to get that configured, deployed, and I'll report back when I have enough to share for debugging.
Thank you for the code snippets.
but I'd like to know what SDK metrics I should look for in CloudWatch to make it easier to justify enabling it.
It's hard to say at this point, as we don't have indication of what's closing the connections. I believe once you enable client side metrics you get all the available SDK metrics - for reference, a list of metrics can be found here: https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/index.html?com/amazonaws/metrics/package-summary.html
It looks like this issue hasn’t been active in longer than a week. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please add a comment to prevent automatic closure, or if the issue is already closed please feel free to reopen it.
On a daily basis, my service makes DynamoDB requests, and occasionally sees latency spikes where the requests are lasting anywhere from 10 to 20 seconds—ultimately resulting in socket closure.
Describe the issue
In my AWS-hosted EKS containerized (on Debian Slim) Java (JDK11) application, I am using AWS SDK v1.11.837 to interact with our DynamoDB instance. Typical behavior is fine, but at least once per day, a container will begin seeing latency spikes which my JFR (datadog) tracing indicates the culprit as
javax.net.ssl.SslException: Socket closed
. Overall, my requests appear to succeed 🤔 but these long-duration DynamoDB operations cause service degradation.I would like to better understand what the likely culprit of these socket closures are (I have not seen many of these in the Issues reported here which are attributed to SslException).
Steps to Reproduce
I cannot provide a code snippet to reproduce this, but I can provide traces and logs (below).
Current Behavior
DynamoDB requests through the SDK take a very long time and likely fail to update the item (unable to confirm). I do not have any data about connections available at the time, but I am curious if perhaps they are being exhausted.
There was a slight spike in volume of traffic (and as a result, a correlating spike in network Rx and Tx indicating we made more calls out to DynamoDB during this time).
Your Environment
adoptopenjdk/openjdk11:jre-11.0.10_9-debianslim
Any guidance is appreciated. If I should direct this to AWS support I can also do that.