Closed lordpengwin closed 2 months ago
It's unusual for the SDK client to hang indefinitely, I expect the request to timeout at some point. Since you see it across different clients I wonder if the issue is related to ECS.
Have you tried to reproduce in a different environment outside a container?
Are you setting any custom ClientConfiguration when creating the autoScalingClient
?
Can you generate the verbose wirelogs? Instructions here. Make sure to redact any sensitive information like access keys.
I've seen this with both ECS and SageMaker, though in either case I'm making the call to an Auto Scaling Group. I believe that I've seen this both from the container and from an Amazon Linux development machine. I'm not setting a custom ClientConfiguration on the autoScalingClient. I've also had this happen in multiple AWS accounts. I will try to do some experiments today to see if I can recreate the problem consistently, it has happened randomly in the past. If I can, I will try to enable the wire logs as described above. I will also try to get a Java thread dump.
So I might have been wrong here. I managed to get my application to hang again and it does not appear to be stuck where I thought it was. It appears that it is simply not exiting. A thread dump shows this running:
`"s3-transfer-manager-worker-1" #40 prio=5 os_prio=0 cpu=135446.59ms elapsed=8370.90s allocated=4078M defined_classes=95 tid=0x00007f327202d0d0 nid=0x70 waiting on condition [0x00007f323a4fe000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@17.0.6/Native Method)
I suspect that the problem is that an S3 transfer manager is not being cleaned up correctly.
This issue is now closed.
Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.
I'm pretty sure that this was my problem. Thanks for the help
Upcoming End-of-Support
Describe the bug
When deploying a service to ECS using the Java SDK I've seen many instances where a call to AWSApplicationAutoScalingClient.putScalingPolicy() never returns. This happens about 10% of the time that I use this call. I've even set withSdkRequestTimeout() on the PutScalingPolicyRequest and it still hangs. Note: The policy does get applied to the the service but the call never returns.
Is this a known problem? Is there a way that I can debug or work around it.
Expected Behavior
The SDK call should return or timeout.
Current Behavior
Hangs forever
Reproduction Steps
This is my code:
autoScalingClient.putScalingPolicy(new PutScalingPolicyRequest() .withResourceId(resourceID) .withServiceNamespace(ServiceNamespace.Ecs) .withPolicyName(String.format(APPLICATION_SCALING_POLICY, deployedServiceName)) .withScalableDimension(ScalableDimension.EcsServiceDesiredCount) .withPolicyType(PolicyType.TargetTrackingScaling) .withTargetTrackingScalingPolicyConfiguration(new TargetTrackingScalingPolicyConfiguration() .withPredefinedMetricSpecification(new PredefinedMetricSpecification().withPredefinedMetricType(autoScaleConfig.getScaleUpMetric()).withResourceLabel(loadBalancerArn.substring(loadBalancerArn.indexOf("app/")) + "/" + targetGroupARN.substring(targetGroupARN.indexOf("targetgroup/")))) .withTargetValue(autoScaleConfig.getScaleUpThreshold()) .withScaleOutCooldown(autoScaleConfig.getScaleUpCooldown()) .withScaleInCooldown(autoScaleConfig.getScaleDownCooldown()) ) .withSdkRequestTimeout(30000) );
Possible Solution
No response
Additional Information/Context
I've also seen this when doing the same call against SageMaker.
AWS Java SDK version used
1.12.435
JDK version used
17.0.6
Operating System and version
container ubi9-minimal:latest