Closed ssyberg closed 2 years ago
Example output from aws ecs execute-command ...
:
The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.
An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.
Exact output for everyone with this problem as far as I can tell ☝🏼
This looks related: https://github.com/aws-containers/amazon-ecs-exec-checker/issues/49
Do you also have AWS_ACCESS_KEY
/ AWS_SECRET_ACCESS_KEY
set? That may be causing the issue.
Do you also have AWS_SECRET_ACCESS_KEY set? That may be causing the issue.
If my parsing of the terraform config can be trusted, we are not setting that in environment_variables
but it is available in the secrets
I'll try removing this now and see if that makes a difference.
Do you also have AWS_SECRET_ACCESS_KEY set? That may be causing the issue.
If my parsing of the terraform config can be trusted, we are not setting that in
environment_variables
but it is available in thesecrets
I'll try removing this now and see if that makes a difference.
Holy moly that worked! That said, we actually actively use those credentials in our task, so we'll need a workaround for exposing them. Still seems like setting these env vars shouldn't have this effect right?
Glad that worked! I'm waiting on more info regarding this and will post an update here.
Renaming AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
has also fixed the problem for me! Bizarre that this just started happening at ~5pm EST March 30 out of nowhere.
Can we revert to previous version of aws cli to fix this ? Because changing the environments will break other things in our tasks
Facing this issue as well. As @nathando mentioned, would be great if it reverted to the previous behaviour so that we don't have to change the environment variables.
Encountered this error: "An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later." out of nowhere 4 days ago.
In my case i also had AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
set as ENV variables (in my task description) since my application needs to interact with the AWS API. It was working fine until now, so something must have changes in recent updates.
There is no need to change the environment variables through, all you need to do is to give the user (AWS_ACCESS_KEY_ID) permissions to allow the ECS exec command
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ssmmessages:CreateControlChannel",
"ssmmessages:CreateDataChannel",
"ssmmessages:OpenControlChannel",
"ssmmessages:OpenDataChannel"
],
"Resource": "*"
}
]
}
Thanks @nicolasbuch, those requirements are also documented here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-exec.html#ecs-exec-prerequisites as well as this troubleshooting article for the TargetNotConnectedException
error: https://aws.amazon.com/premiumsupport/knowledge-center/ecs-error-execute-command/
Those requirements aren’t new so I’m not sure why recent updates would be a factor here. Has anyone tried rolling back to a previous SSM Agent version to see if they still see this issue? It would help the team to have agent logs from a container that is experiencing the issue. You could provide those here or contact AWS Support.
The agent version in ECS Exec is controlled by ECS during AMI build and they say they haven't changed the version recently. Can anyone here that encountered the issue and has removed AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
from their environment start a session and get the agent version?
# Assuming your session starts in the ECS Exec bin folder
./amazon-ssm-agent -version
Also, are you seeing this issue on ECS on EC2 or Fargate?
@Thor-Bjorgvinsson after making the change and removing the envvars I can access the containers and see the following versions according to the log output on Fargate tasks
amazon-ssm-agent - v3.1.715.0
ssm-agent-worker - v3.1.715.0
@Thor-Bjorgvinsson Seeing the issue on Fargate.
We also experience the same issue since last Friday (01 April 2022). We didn't change anything and the command execution stopped working. We also have AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in the env. Funny enough on one of our environments it still works but on 2 other stopped.
We are now investigating the permissions differences.
The user on that env has admin access rights (dev env)
We've confirmed that this is a SSM Agent issue in a recent Fargate deployment where the agent version was updated. Any new tasks started in Fargate will use a SSM Agent build with this issue. We are working with the Fargate team to deploy a fix for this. Mitigation as mentioned above, remove AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from task definition environment variables
Encountered this error: "An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later." out of nowhere 4 days ago.
In my case i also had
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
set as ENV variables (in my task description) since my application needs to interact with the AWS API. It was working fine until now, so something must have changes in recent updates.There is no need to change the environment variables through, all you need to do is to give the user (AWS_ACCESS_KEY_ID) permissions to allow the ECS exec command
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "ssmmessages:CreateControlChannel", "ssmmessages:CreateDataChannel", "ssmmessages:OpenControlChannel", "ssmmessages:OpenDataChannel" ], "Resource": "*" } ] }
this worked for me.
@akhiljalagam I can confirm this can be used for mitigation today but not recommended, this will not be possible in the close future sometime after fix has been released. Agent will only be able to connect using ECS Task metadata service credentials.
The recommended mitigation is to unset the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
environment variables
@Thor-Bjorgvinsson, how can we follow the status of this? I don't want to be too pushy, but we're really blocked 😬. Is there some kind of prioritisation as this is a regression?
Anyway, thanks for the work 💪 .
We've pushed out a fix in agent release 3.1.1260.0 for this issue. We're currently working with related AWS services to integrate this fix; we'll add further updates as those integrations are completed.
For other people who come across this issue, this error happens for us when we have AWS_SHARED_CREDENTIALS_FILE set as an environment variable as well. When it is removed, ecs execute-command
works correctly.
Hopefully this doesn't put a spanner in the works - but i've been having this issue across all of my services. Only 1 of the services actually had AWS env vars in them, after renaming those that service was fine.
The others however, still respond with the same "Internal server error", with no AWS env vars to note on the tasks.
I'm seeing this again since the 3.1.1260.0 release. Is it possible other env variable names are now disallowed? In particular, I changed my AWS_SECRET_ACCESS_KEY
env variable to AWS_SECRET_ACCESS_KEY_ECS
, which was working until the 3.1.1260.0 release. Now changing that key to AWS_SECRET_ACCESS_KEY_<something>_ECS
, I am able to connect again.
I'm wondering if the fix in 3.1.1260.0 was to switch from using AWS_SECRET_ACCESS_KEY
to AWS_SECRET_ACCESS_KEY_ECS
in some internal API. If so, perhaps more of a root cause fix is needed. Or documentation which specifies which env variable names cause these conflicts.
maybe it's partially matching AWS_SECRET_ACCESS_KEY*
instead of just AWS_SECRET_ACCESS_KEY
? 🤔
No error for me with AWS_SECRET_ACCESS_KEY_2
Is there any update on when the fix will be rolled out?
Renaming AWS_SECRET_ACCESS_KEY
and AWS_ACCESS_KEY_ID
variables did the job!
ECS released a new AMI with the updated SSM Agent (ecs optimized ami version 20220421), still pending Fargate release.
Any news concerning the fargate release ?
Without changing anything regarding the env variables I redeployed my ECS fargate instances and with the latest awscli this works fine now.
Fargate has completed release of the new agent
Hi Guys,
I have used the ECS checker and this is the below result:
-------------------------------------------------------------
Prerequisites for check-ecs-exec.sh v0.7
-------------------------------------------------------------
jq | OK (/opt/homebrew/bin/jq)
AWS CLI | OK (/opt/homebrew/bin/aws)
-------------------------------------------------------------
Prerequisites for the AWS CLI to use ECS Exec
-------------------------------------------------------------
AWS CLI Version | OK (aws-cli/2.11.9 Python/3.11.2 Darwin/22.4.0 source/arm64 prompt/off)
Session Manager Plugin | OK (1.2.463.0)
-------------------------------------------------------------
Checks on ECS task and other resources
-------------------------------------------------------------
Region : eu-west-1
Cluster: app-service-cluster-test
Task : b460c8c1bb334429a39ff7a4b1bad180
-------------------------------------------------------------
Cluster Configuration |
KMS Key : Not Configured
Audit Logging : DEFAULT
S3 Bucket Name: Not Configured
CW Log Group : Not Configured
Can I ExecuteCommand? | arn:aws:iam::117038214493:user/cli-admin
ecs:ExecuteCommand: allowed
ssm:StartSession denied?: allowed
Task Status | RUNNING
Launch Type | Fargate
Platform Version | 1.4.0
Exec Enabled for Task | OK
Container-Level Checks |
----------
Managed Agent Status
----------
1. RUNNING for "app-service-test-container"
----------
Init Process Enabled (app-service-task-definition-test:18)
----------
1. Disabled - "app-service-test-container"
----------
Read-Only Root Filesystem (app-service-task-definition-test:18)
----------
1. Disabled - "app-service-test-container"
Task Role Permissions | arn:aws:iam::117038214493:role/TuskProdECSTaskRole
ssmmessages:CreateControlChannel: allowed
ssmmessages:CreateDataChannel: allowed
ssmmessages:OpenControlChannel: allowed
ssmmessages:OpenDataChannel: allowed
VPC Endpoints |
Found existing endpoints for vpc-00bfcd992d7f50681:
- com.amazonaws.eu-west-1.ssmmessages
- com.amazonaws.eu-west-1.s3
- com.amazonaws.vpce.eu-west-1.vpce-svc-0e7975f61ffb9d0f7
Environment Variables | (app-service-task-definition-test:18)
1. container "app-service-test-container"
- AWS_ACCESS_KEY: not defined
- AWS_ACCESS_KEY_ID: not defined
- AWS_SECRET_ACCESS_KEY: not defined
All the configuration seems to be okay... the AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
is not defined. But I am still getting TargetNotConnectedException
. Am I missing something?
AWS CLI version: 2.11.9
But the AWS_ACCESS_KEY and AWS_SECRET_ACCESS_KEY variables are defined in the .env file inside the container. I hope that's not the issue.
I'm experiencing this exact same issue. The aws ecs execute-command
was working for me last week and it stopped working.
Did anyone else stumbled to this problem again?
We started getting this issue again. There are no AWS_ACCESS_KEY / SECRET defined and check-ecs-exec.sh shows everything OK (green and yellow)
-------------------------------------------------------------
Prerequisites for check-ecs-exec.sh v0.7
-------------------------------------------------------------
jq | OK (/usr/bin/jq)
AWS CLI | OK (/usr/local/bin/aws)
-------------------------------------------------------------
Prerequisites for the AWS CLI to use ECS Exec
-------------------------------------------------------------
t.21 prompt/off)
Session Manager Plugin | OK (1.2.497.0)
-------------------------------------------------------------
Checks on ECS task and other resources
-------------------------------------------------------------
Region : us-east-1
Cluster: cluster-name
Task : 949fd5e48ebf4ba4b895176cb0c36d50
Cluster Configuration |
KMS Key : Not Configured
Audit Logging : DEFAULT
S3 Bucket Name: Not Configured
CW Log Group : Not Configured
Can I ExecuteCommand? | arn:aws:iam::xxx:user/deployment
ecs:ExecuteCommand: allowed
ssm:StartSession denied?: allowed
Task Status | RUNNING
Platform Version | 1.4.0
Exec Enabled for Task | OK
Container-Level Checks |
----------
Managed Agent Status
----------
1. RUNNING for "metabase_app_dev"
----------
Init Process Enabled (metabase_dev:3)
----------
1. Disabled - "metabase_app_dev"
----------
Read-Only Root Filesystem (metabase_dev:3)
----------
1. Disabled - "metabase_app_dev"
Task Role Permissions | arn:aws:iam::xxx:role/metabase_ecsTaskExecutionRole_dev
ssmmessages:CreateControlChannel: allowed
ssmmessages:CreateDataChannel: allowed
ssmmessages:OpenControlChannel: allowed
ssmmessages:OpenDataChannel: allowed
VPC Endpoints |
Found existing endpoints for vpc-081adc23fcb697c58:
- com.amazonaws.us-east-1.execute-api
- com.amazonaws.us-east-1.secretsmanager
- com.amazonaws.vpce.us-east-1.vpce-svc-0256367e65088edb5
- com.amazonaws.us-east-1.ssmmessages
Environment Variables | (metabase_dev:3)
1. container "metabase_app_dev"
- AWS_ACCESS_KEY: not defined
- AWS_ACCESS_KEY_ID: not defined
- AWS_SECRET_ACCESS_KEY:: not defined
$ aws ecs execute-command --cluster cluster-name --task 949fd5e48ebf4ba4b895176cb0c36d50 --container metabase_app_dev --command 'sh' --interactive
The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.
An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.
There are a number of github issues floating around on related repos that might be tied to recent ssm agent updates, though this is incredibly difficult to verify from our end, if someone could do a little investigating that would be great.
The general issue that manifests is an inability to run the
execute-command
via cli and aTargetNotConnectedException
thrown. Existing troubleshooting guides have thus far not yielded success.Related tickets:
https://github.com/aws/aws-cli/issues/6834 https://github.com/aws/aws-cli/issues/6562 https://github.com/aws-containers/amazon-ecs-exec-checker/issues/47