Closed mtamassia closed 4 years ago
Thanks for the report. Looks like the connection managers are not being ~~removed from the map during client close shutdown during client close.~~
Looks like I spoke too soon; looking over the code again, everything seems in order. And doing a simple test continuously creating a client, making a request, and closing it keeps the reaper map size at 0. Do you have a repro case that we can run?
Thank you for the quick response @dagnir
I have reviewed our code and found a couple instances where close()
was not being properly called, however after the changes, we still see the memory buildup as you can see on the new report below. BTW, this time I forced a GC, just for sanity, before doing the heap dump.
Not sure if it matters but here are the SDK clients we are using and how we are instantiating each of them.
SecretsManagerClient smClient = SecretsManagerClient.builder().credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();
Ec2Client ec2Client = Ec2Client.builder().region(Region.of(this.getRegion())).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();
PricingClient pClient = PricingClient.builder().region(Region.US_EAST_1).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();
RdsClient rdsClient = RdsClient.builder().region(Region.of(this.getRegion())).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();
CloudTrailClient client = CloudTrailClient.builder().credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).overrideConfiguration(ClientOverrideConfiguration.builder().retryPolicy(RetryPolicy.builder().throttlingBackoffStrategy(BackoffStrategy.defaultStrategy()).build()).build()).build();
ElasticLoadBalancingV2Client client = ElasticLoadBalancingV2Client.builder().region(Region.of(this.getRegion())).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();
DirectConnectClient client = DirectConnectClient.builder().region(Region.of(currentRegion)).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();
S3Client client = S3Client.builder().region(Region.of(this.getRegion())).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();
EfsClient client = EfsClient.builder().region(Region.of(currentRegion)).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();
I tried creating multiple sdk clients, making api calls, and then closing the clients but was not able to reproduce the issue. (verified that IdleConnectionReaper#connectionManagers
was empty in the end)
Could you provide a complete repro case?
We were not able to reproduce it, closing this due to no response. Please feel free to reopen if you are still experiencing this issue and can provide us a complete repro case.
@debora-ito , @dagnir , @zoewangg, @mtamassia
I am facing same issue , I also suspect that there is possible memory leak in Connection Repeater. My application used to run pretty well for 1 week and then it was crashed because of Out Of memory. I attached Jprofiler and it seems connectionrepeater was not freeing up. Please look at below screenshot:
further drilling this heavy object i saw thousands of below instances,
And here is path to GC root to those thousands of objects,
My application is simple, it creates thread => create credential provide as shown in below code => Collect some data from cloudwatch using this credential provider => and then thread dies.
Code to create credential provider :
public static AwsCredentialsProvider getRoleCredentialsProvider(String _roleArn, String _roleSessionName,
String _externalId, String _region) {
AssumeRoleRequest assumeRoleRequest = null;
if (_externalId == null || _externalId.isEmpty()) {
assumeRoleRequest = AssumeRoleRequest.builder().roleArn(_roleArn).roleSessionName(_roleSessionName)
.durationSeconds(900).build();
} else {
assumeRoleRequest = AssumeRoleRequest.builder().roleArn(_roleArn).externalId(_externalId)
.roleSessionName(_roleSessionName).durationSeconds(900).build();
}
StsClient stsClient = null;
// AWS recommends using Regional AWS STS endpoints hence region as input.
if (_region == null || _region.isEmpty()) {
stsClient = StsClient.builder().region(Region.US_EAST_1).build();
} else {
stsClient = StsClient.builder().region(Region.of(_region)).build();
}
return StsAssumeRoleCredentialsProvider.builder().stsClient(stsClient).refreshRequest(assumeRoleRequest)
.asyncCredentialUpdateEnabled(true).build();
}
Code in finally block of my application to close this provider:
// Close StsAssumeRoleCredentialsProvider.
if(credentialProvider != null && credentialProvider instanceof StsAssumeRoleCredentialsProvider) {
((StsAssumeRoleCredentialsProvider)credentialProvider).close();
}
I can easily reproduce this issue, As jprofiler is running live it clearly show memory ++ because of these objects.
I am using below SDK version
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>sts</artifactId>
<version>2.5.40</version>
</dependency>
Please let me know if you need any additional info.
Hi @Dhananjay-Ghongane closing the credential provider itself does not close the stsClient.
So you need to close the stsClient too. The reason that the stsClient
is not closed automatically is because it's provided by customers.
@zoewangg , Thanks, That should work. My jobs are short lived so, I may not want backend thread to periodically updated credentials. I am not using StsAssumeRoleCredentialsProvider, simply calling AssumeRole => take credentials => Use it and repeat the same.
@zoewangg I think what you posted above might resolve my issue too.
Our code had an implicit credentials provider:
Ec2Client ec2Client = Ec2Client.builder().region(Region.of(this.region)).credentialsProvider(InstanceProfileCredentialsProvider.builder().build()).build();
...
ec2Client.close();
so based on your message above, we changed it to this:
InstanceProfileCredentialsProvider credProvider = InstanceProfileCredentialsProvider.builder().build();
Ec2Client ec2Client = Ec2Client.builder().region(Region.of(this.region)).credentialsProvider(credProvider).build();
...
ec2Client.close();
credProvider.close();
In short, we are now closing the credentials provider. I have not tested this at scale but do you think that could explain the leak on our end?
@mtamassia I don't think not closing InstanceProfileCredentialProvider
could cause IdleConnectionReaper
memory leak since it's not using IdleConnectionReaper
at all. Also, asyncCredentialUpdateEnabled
is false by default, so there's really nothing to be closed in InstanceProfileCredentialProvider
.
Is it possible that some SDK client were not closed properly (failed to be closed silently) in your use case? Can you enable DEBUG log level to see if there were any error messages?
It should be something like this: https://github.com/aws/aws-sdk-java-v2/blob/3d2e2e5ffe164e1876dbeccf5ef9fb486b2725b2/utils/src/main/java/software/amazon/awssdk/utils/IoUtils.java#L74
Hi @zoewangg , I'm running into a similar issue. The only difference is that I am using the default credential provider chain, which in turn resolves to the StsWebIdentityCredentialsProvider. After a lot of debugging I figured out that closing the client does not close the STS client created by the StsWebIdentityCredentialsProvider when it is closed.
TLDR;
Provider chain in Lazy<T>
never gets closed.
Long version;
DefaultCredentialsProvider#close
calls LazyAwsCredentialsProvider#close
, which calls IoUtils.closeIfCloseable
, but the provided delegate
is not an AutoClosable type. It is actually a Lazy<AwsCredentialsProvider>
, and Lazy<T>
does not implement AutoClosable. So the chain (and consequently the STS client) in the delegate field never receives a close() call.
Would you like me to open a new issue?
Screenshot of the debugger
Stack dump from IntelliJ
closeIfCloseable:89, IoUtils (software.amazon.awssdk.utils) [2]
close:50, LazyAwsCredentialsProvider (software.amazon.awssdk.auth.credentials.internal)
close:109, DefaultCredentialsProvider (software.amazon.awssdk.auth.credentials)
closeQuietly:70, IoUtils (software.amazon.awssdk.utils)
closeIfCloseable:87, IoUtils (software.amazon.awssdk.utils) [1]
lambda$close$0:86, AttributeMap (software.amazon.awssdk.utils)
accept:-1, 912038923 (software.amazon.awssdk.utils.AttributeMap$$Lambda$1914)
forEach:981, HashMap$Values (java.util)
close:86, AttributeMap (software.amazon.awssdk.utils)
close:79, SdkClientConfiguration (software.amazon.awssdk.core.client.config)
close:80, HttpClientDependencies (software.amazon.awssdk.core.internal.http)
close:73, AmazonSyncHttpClient (software.amazon.awssdk.core.internal.http)
close:118, BaseSyncClientHandler (software.amazon.awssdk.core.internal.handler)
close:12198, DefaultIamClient (software.amazon.awssdk.services.iam)
discover:37, IamServiceRoleAwsDiscovery (com.planonsoftware.cloud.library.discovery.iamrole)
provideParameters:63, PlanonEnvironmentActivitiesImpl (com.planonsoftware.cloud.pco.orchestrator.controllers.planonenvironment.activities)
invoke0:-1, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:62, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:43, DelegatingMethodAccessorImpl (jdk.internal.reflect)
invoke:567, Method (java.lang.reflect)
execute:214, POJOActivityTaskHandler$POJOActivityImplementation (com.uber.cadence.internal.sync)
handle:190, POJOActivityTaskHandler (com.uber.cadence.internal.sync)
handle:175, ActivityWorker$TaskHandlerImpl (com.uber.cadence.internal.worker)
handle:146, ActivityWorker$TaskHandlerImpl (com.uber.cadence.internal.worker)
lambda$process$0:71, PollTaskExecutor (com.uber.cadence.internal.worker)
run:-1, 640627828 (com.uber.cadence.internal.worker.PollTaskExecutor$$Lambda$1491)
runWorker:1128, ThreadPoolExecutor (java.util.concurrent)
run:628, ThreadPoolExecutor$Worker (java.util.concurrent)
run:830, Thread (java.lang)
We have an application that creates and destroys EC2, S3, and several other SDK clients.
We build clients within a thread and
close()
the clients within the thread. Example:Memory consumption grows over time and here is a link to the memory heap dump, which is self explicatory: https://heaphero.io/my-heap-report.jsp?p=YXJjaGl2ZWQvMjAyMC8wMy8yLy0tbWVtb3J5LnppcC0xOS01NC0zMi5qc29uLS0=
Expected Behavior
client
close()
should not be leaving resources behindCurrent Behavior
Threads and their associated data should be removed upon calling a client
close()
methodYour Environment