aws / amazon-ecs-agent

Amazon Elastic Container Service Agent
http://aws.amazon.com/ecs/
Apache License 2.0
2.08k stars 611 forks source link

essential container is pending after enabling service connect for ECS on EC2 #4397

Open zxkane opened 2 weeks ago

zxkane commented 2 weeks ago

Summary

Every thing works well. However, the essential container always is pending after enabling service connect on ECS on EC2.

While updating the ECS service to Fargate, it works again with service connect enabled.

I found the ECS agent lost the connection to ECS service when 'sending status change to ECS'.

Description

I'm using below code snippet to create ECS service on EC2 with service connect via CDK,

        const ec2Service = new Ec2Service(this, 'DifyEcsClusterEC2SandboxService', {
            cluster: this.cluster,
            taskDefinition: this.sandboxTaskDefinition,
            desiredCount: 1,
            serviceName: 'ec2-dify-sandbox',
            vpcSubnets: this.cluster.vpc.selectSubnets({ subnetType: SubnetType.PRIVATE_WITH_EGRESS }),
            securityGroups: [this.securityGroup],
            capacityProviderStrategies: [{ capacityProvider: this.clusterDefaultCapacityProviderName, weight: 1 }]
        })

        ec2Service.addPlacementStrategies(
            PlacementStrategy.spreadAcross('attribute:ecs.availability-zone'),
            PlacementStrategy.spreadAcrossInstances(),
        )

        ec2Service.enableServiceConnect({
            namespace: this.cluster.defaultCloudMapNamespace?.namespaceArn,
            services: [{
                portMappingName: this.sandboxTaskDefinition.defaultContainer?.portMappings[0].name || "serverless-dify-sandbox-8194-tcp",
                dnsName: "dify-sandbox-on-ec2.serverless",
                discoveryName: "serverless-dify-sandbox-on-ec2",
            }],
            logDriver: LogDriver.awsLogs({
                streamPrefix: 'serverless-dify/service-connect/',
                logGroup: new LogGroup(this, 'service-connect-log-on-ec2', {
                    retention: RetentionDays.ONE_WEEK,
                  }),
            })
        })

If the service connect is disabled, the container started and ran well. However, it always is pending after enabling service connect.

After inspecting the logs of ecs-agent container on EC2, I found below output in logs.

level=warn time=2024-10-15T05:19:31Z msg="Received an unrecognized attachment property" attachmentProperty="{\n  Name: \"EcsTaskSetArn\",\n  Value: \"arn:aws:ecs:ap-northeast-1:845861764576:task-set/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/serverless-dify-sandbox/ecs-svc/7792931705797793743\"\n}"
level=info time=2024-10-15T05:19:31Z msg="Received task payload from ACS" taskARN="arn:aws:ecs:ap-northeast-1:845861764576:task/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/2700c02483254aef95cbc43f497f16db" taskVersion="3" desiredStatus=RUNNING
level=info time=2024-10-15T05:19:31Z msg="Found application credentials for task" taskVersion="3" roleARN="arn:aws:iam::845861764576:role/ServerlessDifyStack-EcsCl-ServerlessDifyClusterSand-UHkUiKUZcERq" roleType="TaskApplication" credentialsID="50e0b19f-38e1-45c6-a580-07a7ab5c6cbb" taskARN="arn:aws:ecs:ap-northeast-1:845861764576:task/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/2700c02483254aef95cbc43f497f16db"
level=info time=2024-10-15T05:19:31Z msg="Found execution credentials for task" roleType="TaskExecution" credentialsID="85db1dd5-4c11-4e98-8345-7bbe668de81c" taskARN="arn:aws:ecs:ap-northeast-1:845861764576:task/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/2700c02483254aef95cbc43f497f16db" taskVersion="3" roleARN="arn:aws:iam::845861764576:role/ServerlessDifyStack-EcsCl-ServerlessDifyClusterSand-UHkUiKUZcERq"
level=info time=2024-10-15T05:19:31Z msg="Resources successfully consumed, continue to task creation" taskArn="arn:aws:ecs:ap-northeast-1:845861764576:task/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/2700c02483254aef95cbc43f497f16db"
level=info time=2024-10-15T05:19:31Z msg="Host resources consumed, progressing task" taskARN="arn:aws:ecs:ap-northeast-1:845861764576:task/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/2700c02483254aef95cbc43f497f16db"
level=info time=2024-10-15T05:19:31Z msg="Digest resolution not required" taskARN="arn:aws:ecs:ap-northeast-1:845861764576:task/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/2700c02483254aef95cbc43f497f16db" containerName="~internal~ecs~pause" image="amazon/amazon-ecs-pause:0.1.0"
level=info time=2024-10-15T05:19:31Z msg="Handling container change event" knownStatus="NONE" desiredStatus="RESOURCES_PROVISIONED" task="2700c02483254aef95cbc43f497f16db" container="~internal~ecs~pause" runtimeID="" changeEventStatus="MANIFEST_PULLED"
level=info time=2024-10-15T05:19:31Z msg="Creating task cgroup taskARN=arn:aws:ecs:ap-northeast-1:845861764576:task/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/2700c02483254aef95cbc43f497f16db cgroupPath=/ecs/2700c02483254aef95cbc43f497f16db cgroupV2=false" module=cgroup.go
level=info time=2024-10-15T05:19:31Z msg="Transitioned resource" status="CREATED" task="2700c02483254aef95cbc43f497f16db" resource="cgroup"
level=info time=2024-10-15T05:19:31Z msg="Managed task got resource" resource="cgroup" status="CREATED" task="2700c02483254aef95cbc43f497f16db"
level=info time=2024-10-15T05:19:31Z msg="Handling container change event" task="2700c02483254aef95cbc43f497f16db" container="~internal~ecs~pause" runtimeID="" changeEventStatus="PULLED" knownStatus="MANIFEST_PULLED" desiredStatus="RESOURCES_PROVISIONED"
level=info time=2024-10-15T05:19:31Z msg="Creating container" task="2700c02483254aef95cbc43f497f16db" container="~internal~ecs~pause"
level=info time=2024-10-15T05:19:31Z msg="Created container name mapping for task" task="2700c02483254aef95cbc43f497f16db" container="~internal~ecs~pause" dockerContainerName="ecs-serverless-dify-sandbox-3-internalecspause-b4edaff8a49797bb4500"
level=info time=2024-10-15T05:19:31Z msg="Created docker container for task" task="2700c02483254aef95cbc43f497f16db" container="~internal~ecs~pause" dockerId="f92cdfa8241c3bfca21afb8eec25ab7258ec249e88399c17fd6808a63f8a5ca9" elapsed=74.224358ms
level=info time=2024-10-15T05:19:31Z msg="Handling container change event" runtimeID="f92cdfa8241c3bfca21afb8eec25ab7258ec249e88399c17fd6808a63f8a5ca9" changeEventStatus="CREATED" knownStatus="PULLED" desiredStatus="RESOURCES_PROVISIONED" task="2700c02483254aef95cbc43f497f16db" container="~internal~ecs~pause"
level=info time=2024-10-15T05:19:31Z msg="Starting container" runtimeID="f92cdfa8241c3bfca21afb8eec25ab7258ec249e88399c17fd6808a63f8a5ca9" task="2700c02483254aef95cbc43f497f16db" container="~internal~ecs~pause"
level=info time=2024-10-15T05:19:31Z msg="Started container" task="2700c02483254aef95cbc43f497f16db" container="~internal~ecs~pause" runtimeID="f92cdfa8241c3bfca21afb8eec25ab7258ec249e88399c17fd6808a63f8a5ca9" elapsed=171.265595ms
level=info time=2024-10-15T05:19:31Z msg="Handling container change event" changeEventStatus="RUNNING" knownStatus="CREATED" desiredStatus="RESOURCES_PROVISIONED" task="2700c02483254aef95cbc43f497f16db" container="~internal~ecs~pause" runtimeID="f92cdfa8241c3bfca21afb8eec25ab7258ec249e88399c17fd6808a63f8a5ca9"
level=info time=2024-10-15T05:19:31Z msg="Start streaming metrics for container" runtimeID="f92cdfa8241c3bfca21afb8eec25ab7258ec249e88399c17fd6808a63f8a5ca9"
level=info time=2024-10-15T05:19:31Z msg="Setting up container resources for container" task="2700c02483254aef95cbc43f497f16db" container="~internal~ecs~pause"
level=info time=2024-10-15T05:19:31Z msg="Setting up CNI config for task" cniContainerNetNs="/host/proc/24031/ns/net" task="2700c02483254aef95cbc43f497f16db" cniContainerID="f92cdfa8241c3bfca21afb8eec25ab7258ec249e88399c17fd6808a63f8a5ca9" cniPluginPath="" cniID="06:d5:89:0d:2f:2d" cniBridgeName=""
level=info time=2024-10-15T05:19:31Z msg="Task associated with ip address" task="2700c02483254aef95cbc43f497f16db" ip="169.254.172.4"
level=info time=2024-10-15T05:19:31Z msg="Handling container change event" task="2700c02483254aef95cbc43f497f16db" container="~internal~ecs~pause" runtimeID="f92cdfa8241c3bfca21afb8eec25ab7258ec249e88399c17fd6808a63f8a5ca9" changeEventStatus="RESOURCES_PROVISIONED" knownStatus="RUNNING" desiredStatus="RESOURCES_PROVISIONED"
level=info time=2024-10-15T05:19:31Z msg="Handling container change event" changeEventStatus="MANIFEST_PULLED" knownStatus="NONE" desiredStatus="RUNNING" task="2700c02483254aef95cbc43f497f16db" container="ecs-service-connect-nconQf" runtimeID=""
level=info time=2024-10-15T05:19:31Z msg="Handling container change event" container="sandbox" runtimeID="" changeEventStatus="MANIFEST_PULLED" knownStatus="NONE" desiredStatus="RUNNING" task="2700c02483254aef95cbc43f497f16db"
level=info time=2024-10-15T05:19:31Z msg="Container change also resulted in task change" runtimeID="" desiredStatus="RUNNING" knownStatus="MANIFEST_PULLED" task="2700c02483254aef95cbc43f497f16db" container="ecs-service-connect-nconQf"
level=info time=2024-10-15T05:19:31Z msg="Handling container change event" container="ecs-service-connect-nconQf" runtimeID="" changeEventStatus="PULLED" knownStatus="MANIFEST_PULLED" desiredStatus="RUNNING" task="2700c02483254aef95cbc43f497f16db"
level=info time=2024-10-15T05:19:31Z msg="Creating container" task="2700c02483254aef95cbc43f497f16db" container="ecs-service-connect-nconQf"
level=info time=2024-10-15T05:19:31Z msg="Applying execution role credentials to container log auth" awslogs-credentials-endpoint="/v2/credentials/85db1dd5-4c11-4e98-8345-7bbe668de81c" taskARN="arn:aws:ecs:ap-northeast-1:845861764576:task/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/2700c02483254aef95cbc43f497f16db" roleType="TaskExecution" roleARN="arn:aws:iam::845861764576:role/ServerlessDifyStack-EcsCl-ServerlessDifyClusterSand-UHkUiKUZcERq" credentialsID="85db1dd5-4c11-4e98-8345-7bbe668de81c"
level=info time=2024-10-15T05:19:31Z msg="Created container name mapping for task" dockerContainerName="ecs-serverless-dify-sandbox-3-ecs-service-connect-nconQf-94b4e0ccf1d3fdbf9d01" task="2700c02483254aef95cbc43f497f16db" container="ecs-service-connect-nconQf"
level=info time=2024-10-15T05:19:32Z msg="Created docker container for task" container="ecs-service-connect-nconQf" dockerId="fd2e9f9d1e405bd64f17b9afb9c41dbf8a97f4ca8cf1ffe465b3f4652782f11b" elapsed=70.437094ms task="2700c02483254aef95cbc43f497f16db"
level=info time=2024-10-15T05:19:32Z msg="Handling container change event" knownStatus="PULLED" desiredStatus="RUNNING" task="2700c02483254aef95cbc43f497f16db" container="ecs-service-connect-nconQf" runtimeID="fd2e9f9d1e405bd64f17b9afb9c41dbf8a97f4ca8cf1ffe465b3f4652782f11b" changeEventStatus="CREATED"
level=info time=2024-10-15T05:19:32Z msg="Starting container" task="2700c02483254aef95cbc43f497f16db" container="ecs-service-connect-nconQf" runtimeID="fd2e9f9d1e405bd64f17b9afb9c41dbf8a97f4ca8cf1ffe465b3f4652782f11b"
2024-10-15T05:19:32Z 200 10.0.169.54:59412 "/v2/credentials" "Go-http-client/1.1" arn:aws:ecs:ap-northeast-1:845861764576:task/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/2700c02483254aef95cbc43f497f16db GetCredentialsExecutionRole 2 ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq arn:aws:ecs:ap-northeast-1:845861764576:container-instance/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/3929aad2e3374aa79d0480c7eb68919d
level=info time=2024-10-15T05:19:32Z msg="Processing credential request, credentialType=TaskExecution taskARN=arn:aws:ecs:ap-northeast-1:845861764576:task/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/2700c02483254aef95cbc43f497f16db" module=credentials_handler.go
level=info time=2024-10-15T05:19:32Z msg="Started container" runtimeID="fd2e9f9d1e405bd64f17b9afb9c41dbf8a97f4ca8cf1ffe465b3f4652782f11b" elapsed=170.946791ms task="2700c02483254aef95cbc43f497f16db" container="ecs-service-connect-nconQf"
level=info time=2024-10-15T05:19:32Z msg="Handling container change event" knownStatus="CREATED" desiredStatus="RUNNING" task="2700c02483254aef95cbc43f497f16db" container="ecs-service-connect-nconQf" runtimeID="fd2e9f9d1e405bd64f17b9afb9c41dbf8a97f4ca8cf1ffe465b3f4652782f11b" changeEventStatus="RUNNING"
level=info time=2024-10-15T05:19:32Z msg="Start streaming metrics for container" runtimeID="fd2e9f9d1e405bd64f17b9afb9c41dbf8a97f4ca8cf1ffe465b3f4652782f11b"
level=info time=2024-10-15T05:19:32Z msg="Writing response for v4 task metadata" tmdsEndpointContainerID="e1ab4a8e-3b73-4f70-9d72-85eba94ca012" taskARN="arn:aws:ecs:ap-northeast-1:845861764576:task/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/2700c02483254aef95cbc43f497f16db"
level=info time=2024-10-15T05:19:32Z msg="Writing response for v4 container metadata" tmdsEndpointContainerID="e1ab4a8e-3b73-4f70-9d72-85eba94ca012" container="fd2e9f9d1e405bd64f17b9afb9c41dbf8a97f4ca8cf1ffe465b3f4652782f11b"
level=info time=2024-10-15T05:19:32Z msg="Writing response for v4 task metadata" tmdsEndpointContainerID="e1ab4a8e-3b73-4f70-9d72-85eba94ca012" taskARN="arn:aws:ecs:ap-northeast-1:845861764576:task/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/2700c02483254aef95cbc43f497f16db"
level=info time=2024-10-15T05:19:50Z msg="Sending state change to ECS" eventType="TaskStateChange" taskArn="arn:aws:ecs:ap-northeast-1:845861764576:task/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/2700c02483254aef95cbc43f497f16db" taskStatus="MANIFEST_PULLED" taskReason="" taskPullStartedAt="0001-01-01T00:00:00Z" taskPullStoppedAt="0001-01-01T00:00:00Z" taskKnownSentStatus="NONE" taskExecutionStoppedAt="0001-01-01T00:00:00Z" containerChange-0="containerName=ecs-service-connect-nconQf containerStatus=RUNNING containerKnownSentStatus=NONE containerRuntimeID=fd2e9f9d1e405bd64f17b9afb9c41dbf8a97f4ca8cf1ffe465b3f4652782f11b containerIsEssential=true"
level=info time=2024-10-15T05:21:12Z msg="TCS Websocket connection closed for a valid reason"
level=info time=2024-10-15T05:21:12Z msg="Using cached DiscoverPollEndpoint" endpoint="https://ecs-a-5.ap-northeast-1.amazonaws.com/" telemetryEndpoint="https://ecs-t-5.ap-northeast-1.amazonaws.com/" serviceConnectEndpoint="https://ecs-sc.ap-northeast-1.api.aws" containerInstanceARN="arn:aws:ecs:ap-northeast-1:845861764576:container-instance/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/3929aad2e3374aa79d0480c7eb68919d"
level=info time=2024-10-15T05:21:12Z msg="Establishing a Websocket connection" url="https://ecs-t-5.ap-northeast-1.amazonaws.com/ws?agentHash=cf8c7a6b&agentVersion=1.87.0&cluster=ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq&containerInstance=arn%3Aaws%3Aecs%3Aap-northeast-1%3A845861764576%3Acontainer-instance%2FServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq%2F3929aad2e3374aa79d0480c7eb68919d&dockerVersion=25.0.6"
level=info time=2024-10-15T05:21:12Z msg="Websocket connection established." URL="https://ecs-t-5.ap-northeast-1.amazonaws.com/ws?agentHash=cf8c7a6b&agentVersion=1.87.0&cluster=ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq&containerInstance=arn%3Aaws%3Aecs%3Aap-northeast-1%3A845861764576%3Acontainer-instance%2FServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq%2F3929aad2e3374aa79d0480c7eb68919d&dockerVersion=25.0.6" ConnectTime="2024-10-15 05:21:12" ExpectedDisconnectTime="2024-10-15 05:51:12"
level=info time=2024-10-15T05:21:12Z msg="Connected to TCS endpoint"

You could see abnormal connectivity of between ECS and agent on EC2,

level=info time=2024-10-15T05:19:50Z msg="Sending state change to ECS" eventType="TaskStateChange" taskArn="arn:aws:ecs:ap-northeast-1:845861764576:task/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/2700c02483254aef95cbc43f497f16db" taskStatus="MANIFEST_PULLED" taskReason="" taskPullStartedAt="0001-01-01T00:00:00Z" taskPullStoppedAt="0001-01-01T00:00:00Z" taskKnownSentStatus="NONE" taskExecutionStoppedAt="0001-01-01T00:00:00Z" containerChange-0="containerName=ecs-service-connect-nconQf containerStatus=RUNNING containerKnownSentStatus=NONE containerRuntimeID=fd2e9f9d1e405bd64f17b9afb9c41dbf8a97f4ca8cf1ffe465b3f4652782f11b containerIsEssential=true"
level=info time=2024-10-15T05:21:12Z msg="TCS Websocket connection closed for a valid reason"

After updating above ECS service to use Fargate, it works well.

        const service = new FargateService(this, 'DifyEcsClusterSandboxService', {
            cluster: this.cluster,
            taskDefinition: this.sandboxTaskDefinition,
            desiredCount: 1,
            serviceName: 'fargate-dify-sandbox',
            vpcSubnets: this.cluster.vpc.selectSubnets({ subnetType: SubnetType.PRIVATE_WITH_EGRESS }),
            securityGroups: [this.securityGroup],
        })

        service.enableServiceConnect({
            namespace: this.cluster.defaultCloudMapNamespace?.namespaceArn,
            services: [{
                portMappingName: this.sandboxTaskDefinition.defaultContainer?.portMappings[0].name || "serverless-dify-sandbox-8194-tcp",
                dnsName: "serverles-dify-sandbox",
                discoveryName: "serverless-dify-sandbox",
            }],
            logDriver: LogDriver.awsLogs({
                streamPrefix: 'serverless-dify/service-connect/',
                logGroup: new LogGroup(this, 'service-connect-on-fargate', {
                    retention: RetentionDays.ONE_WEEK,
                  }),
            })
        })

Expected Behavior

The essential container could be started.

Observed Behavior

The ECS agent could not start the container after the service connect container is started.

Environment Details

ECS agent: 1.87.0 EC2 AMI: amzn2-ami-ecs-hvm-2.0.20241010-x86_64-ebs

Supporting Log Snippets

see above description

mye956 commented 2 days ago

Hi @zxkane, thanks for opening up this issue.

You could see abnormal connectivity of between ECS and agent on EC2,

level=info time=2024-10-15T05:19:50Z msg="Sending state change to ECS" eventType="TaskStateChange" taskArn="arn:aws:ecs:ap-northeast-1:845861764576:task/ServerlessDifyStack-EcsClusterStackNestedStackEcsClusterStackNestedStackResourceC1F1FB-8S3K0YS6ISZ-EcsClusterStackAFF371BA-PIgAAAsN8xTq/2700c02483254aef95cbc43f497f16db" taskStatus="MANIFEST_PULLED" taskReason="" taskPullStartedAt="0001-01-01T00:00:00Z" taskPullStoppedAt="0001-01-01T00:00:00Z" taskKnownSentStatus="NONE" taskExecutionStoppedAt="0001-01-01T00:00:00Z" containerChange-0="containerName=ecs-service-connect-nconQf containerStatus=RUNNING containerKnownSentStatus=NONE containerRuntimeID=fd2e9f9d1e405bd64f17b9afb9c41dbf8a97f4ca8cf1ffe465b3f4652782f11b containerIsEssential=true"
level=info time=2024-10-15T05:21:12Z msg="TCS Websocket connection closed for a valid reason"

What you see here is actually the expected behavior where agent disconnecting with the ECS telemetry connection (see ref where this log statement is coming from). This is also not the same ECS endpoint where we send state changes over (it's the ACS endpoint). The TCS endpoint is where we send over metrics. Agent will periodically disconnect and then reconnect back with the telemetry endpoint which you should see a corresponding log statement a bit after.

Could you help clarify a bit more on what you mean by the essential container being stuck in pending? It looks like the container did transition to a running state.

level=info time=2024-10-15T05:19:31Z msg="Handling container change event" task="2700c02483254aef95cbc43f497f16db" container="~internal~ecs~pause" runtimeID="f92cdfa8241c3bfca21afb8eec25ab7258ec249e88399c17fd6808a63f8a5ca9" changeEventStatus="RESOURCES_PROVISIONED" knownStatus="RUNNING" desiredStatus="RESOURCES_PROVISIONED"

If possible, could share a bit more on how the task definition is configured?