aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.21k stars 316 forks source link

[ECS] [request]: Match the status retrieved from the task metadata endpoint with the actual task status #2012

Open masakihr opened 1 year ago

masakihr commented 1 year ago

Community Note

Tell us about your request Plese match the value of KnownStatus retrieved from the task metadata endpoint with the value of lastStatus from the response of DescribeTasks API.

Which service(s) is this request for? ECS Agent

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Currently, there is the below difference between the KnownStatus retrieved from the task metadata endpoint and the lastStatus from the DescribeTasks API response.

Therefore, it is currently not possible to perform a graceful shutdown after detecting the DEACTIVATING status within the task.

Are you currently working around this issue? Handling Task state change events is a workaround.

Additional context ECS Agent 1.70.2

pattersam commented 10 months ago

In my case, I also experience this with the DesiredStatus field in the v4 ECS Fargate metadata endpoint JSON.

It remains as RUNNING even though the AWS ECS Console UI says that the desired status is STOPPED (caused by a scale-in event).

In my case, I would like to use the metadata endpoint to know when the task is in the 'deregistration delay' period of a ALB target group (where it is marked as stopped but not yet killed for a configurable amount of time, typically 5 minutes).

I would consider this a bug, because I would expect that the data received from the metadata endpoint would reflect reality (especially a status field), at least after some short period to allow it to be syncronised.

mgoodings commented 4 months ago

Why has this not been fixed? The endpoint is not functioning correctly. We just built a feature around the AWS docs specification for this endpoint and it just flat out doesn't work as documented.

We need a way to know when the container is in a DEACTIVATING status so we can spin down background workers during a deployment.