Closed leejayhsu closed 1 week ago
Thanks for reaching out. The TargetNotConnectedException
has been reported in several past issues. Have you tried looking through those?
In this troubleshooting post for it says you might get that error for the following reasons:
- The Amazon ECS task role doesn't have the required permissions to run the execute-command command.
- The AWS Identity and Access Management (IAM) role or user that's running the command doesn't have the required permissions.
Others have suggested that the issue could be fixed by changing your environment variables or updating your AMI.
Also could you explain why you marked this as potential-regression
? Was this working for you in a previous version of the AWS CLI?
Hi @tim-finnigan 👋
Yeah I have looked at most of those past issues, but I will look again to make sure I didn't miss any potential solutions.
For context, I'm using ecs fargate, platform version 1.4
Things I've tried to fix this:
taskRoleArn
and executionRoleArn
both have the following permissions
{
"Statement": [
{
"Action": [
"ssmmessages:OpenDataChannel",
"ssmmessages:OpenControlChannel",
"ssmmessages:CreateDataChannel",
"ssmmessages:CreateControlChannel",
],
"Effect": "Allow",
"Resource": "*"
}
],
"Version": "2012-10-17"
}
ecs:ExecuteCommand
com.amazonaws.us-west-2.ssmmessages
)AWS_ACCESS_KEY_ID
or AWS_SECRET_ACCESS_KEY
as env vars in my tasksecs exec used to work for me, so I thought it would be ok to mark this as a regression. But this is only conjecture on my part, so please remove the tag if you feel it is appropriate!
Just to chime in on a potential regression: We are also experiencing this issue with Fargate where things were working fine, and then seemingly stopped working suddenly for no apparent reason. amazon-ecs-exec-checker
is clear.
Thanks for following up - we may need to loop in ECS/Fargate here as well. Did this issue start occurring after updating to a specific version? Could you share your debug logs (with any sensitive info redacted) to help with further investigation?
@tim-finnigan - I...spoke too soon when chiming in above :sweat_smile: . I believe the issue was a bug in our infrastructure as code which caused some non-determinism related to the subnet associated with tasks. A container cycle caused some to land in an isolated subnet inadvertently, and that was the root issue for the "suddenly for no apparent reason". Fixing the IaC issue solved our problem.
@tim-finnigan - I...spoke too soon when chiming in above 😅 . I believe the issue was a bug in our infrastructure as code which caused some non-determinism related to the subnet associated with tasks. A container cycle caused some to land in an isolated subnet inadvertently, and that was the root issue for the "suddenly for no apparent reason". Fixing the IaC issue solved our problem.
No worries, thanks for following up and glad that issue is resolved. For the original issue author — I'll mention this troubleshooting guide again for reference: https://repost.aws/knowledge-center/fargate-ecs-exec-errors. If you're still seeing the issue, please share your debug logs for further investigation.
hi @tim-finnigan
I've narrowed the problem down to a sidecar container `aws-fluent-bit, which I was using to stream logs to datadog. I'm not exactly sure why it's a problem, but I can exec into the fargate task once I remove the aws-fluent-bit
container from the task definition.
Do you happen to know if there are any known issues that would cause fluent bit to interfere with ecs exec? This is the relevant part of the task def
{
"name": "log-router",
"image": "amazon/aws-for-fluent-bit:stable",
"cpu": 0,
"portMappings": [],
"essential": false,
"environment": [],
"mountPoints": [],
"volumesFrom": [],
"user": "0",
"dockerLabels": {
"com.datadoghq.tags.service": "log-router",
"com.datadoghq.tags.env": "dev"
},
"systemControls": [],
"firelensConfiguration": {
"type": "fluentbit",
"options": {
"config-file-type": "file",
"config-file-value": "/fluent-bit/configs/parse-json.conf",
"enable-ecs-log-metadata": "true"
}
}
}
confirmed that removing aws-fluent-bit
container from the task definition fixed the issue. now ecs exec is working properly.
This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.
Hey @leejayhsu we are facing the same problem. I assume removing the log-router
can't be a permanent solution, am curious what did you end up doing?
hi @lkashef 👋
Actually removing log-router
was my permanent solution 😄
It only existed in the task definition because the logging aggregator I used recommended streaming logs to it. I'm now just logging to cloudwatch, and no longer using fluent-bit
for logging.
sorry this probably isn't the answer you were hoping for!
@lkashef I also had another task which I couldn't exec into, and disabling logging in the datadog-agent container fixed it (this was quite unexpected).
Describe the bug
I am unable to use ecs execute-command to connect to my ecs fargate task
Regression Issue
Expected Behavior
I should be able to connect to my ecs fargate task
Current Behavior
It fails to connect to ecs fargate task
command
Error
amazon-ecs-exec-checker output
Reproduction Steps
run this command:
Possible Solution
No response
Additional Information/Context
No response
CLI version used
2.19.4
Environment details (OS name and version, etc.)
Python/3.12.7 Darwin/24.0.0 source/arm64