Open digitlninja opened 4 years ago
@digitlninja as a first step, you can check the ECS service events in the AWS ECS web console to see whether your tasks are flapping (coming up/down) or some specific task (like becoming healthy in the load balancer) is taking a while to stabilize. Or, if you have CloudWatch logs enabled, you can check the task logs in the ECS or CloudWatch logs web console to see if specific containers are hanging on any particular start up step. Also, if your container image recently increased in size or changed location (like a diff region) it might be taking longer to pull?
Sorry these are rather generic advice, but I can't think of a specific reason this action would cause deployments to take longer. If it just recently started happening w/o any changes on your end we could continue to investigate. LMK if any of the above it helpful.
Closing due to lack of response, but feel free to reopen if you find reason to believe this action is effecting your deployment times.
I can confirm this. The ECS task is running & the new updates have been applied, but the GitHub action is still loading. It takes up to 10 minutes to finish and sometimes more.
Same with me while deploying Task definitions via ecs using github actions, it hangs to passing all the way to 25 mins. I had to stop it. I had investigated this issue and verify that from the Events tab, old ones are still not shutting down when new are trying to deploy.. meaning this is a safety secure feature as green-->blue deployment just incase new deployment is not good. But for me my deployment didn't error out just hung up . Is there a fixed for this?
@JefferyHus can you tell me more about that status of your service when the GH action is hanging? Has a new revision successfully been deployed and stabilized, or has a rollback occurred? Any output you have from the service's events (tab visible in ECS console or via ecs:describe-services
) or errors from the GH action event itself would be helpful
@SunnGHubX what you describe sounds similar to https://github.com/aws-actions/amazon-ecs-deploy-task-definition/issues/113#issuecomment-717465045 . Can you let me know if adjusting the deployment preferences fixes your issue?
@allisaurus No errors whatsoever cominng from both the ECS logs or GH actions. GH action just hangs there loading for minutes before moving to the next & final step, the build howevere is successful. The only thing that I could think of is that the GH plugin is waiting a status & ECS doesn't return a status till fargate switches the old container with the new one.
Im facing the same situation using GH actions. In my case, it happened like this:
I used ECS to deploy a django image from ECR. Simple Fargate cluster and simple task definition and NOT using load balancer. When I was messign around to test my deployment I triggered several times my actions an the average time was 4 min. (90% of the time it was consumed by the "Deploy Amazon ECS task definition" step (shown in mage).
Due to my requirements, my implementation must have a static ip, so I did my research and restructured the whole thing, so I added to the ECS Service a Network Load Balancer that uses an Elastic IP. So I was doing my multiples commits to test the result and this time the "Deploy Amazon ECS task definition" was taking 12 minutes each time I ran the actions. I don´t fully understand why it might take more time by adding a Load Balancer.
PS: Apart from the LB, the thign i made different the second time was creating a ecs service from the "task def." tab in the ECS console, instead of go into a cluster and click "Create Service" or "Run task" which was the way I did on my 1st try.
I get timed out at 30 minutes. Everything deploys, but the GH Actions runs until failure.
Is there any update?
I am experiencing this as well. The container appears to be deployed and functioning but github actions just spins. I also use a load balancer for what it's worth.
I too take a long time to deploy. How can I shorten it?
It takes around 12 mins for me to update my Fargate service 😢
Same problem, any update?
I think it's not an action issue, but ECS.
I took a look at the ECS service console and found that status 2/1 Tasks running
lasts here for a long time
The problem is in deregistration_delay parameter, whose default value is 300 seconds.
I tried to set it to 5 seconds and now deploy of the task definition takes about 2 minutes
I found that if you disable the task stability check then it takes only a few seconds/minutes to deploy
- name: Deploy Amazon ECS task definition
uses: aws-actions/amazon-ecs-deploy-task-definition@de0132cf8cdedb79975c6d42b77eb7ea193cf28e
with:
task-definition: ${{ steps.task-def.outputs.task-definition }}
service: ${{ env.ECS_SERVICE }}
cluster: ${{ env.ECS_CLUSTER }}
wait-for-service-stability: false. # <--- default is true
The same task, unchanged, will suddenly take 20-30 minutes, or fail altogether. Seems like an issue worth checking out.
Facing the same issue, is there any update?
same issue - the same task from two month ago now takes for ever - I thought its about RAM/CPU but hey I am using elastic container ...
Same for me: what's surprised me is that a lot of ecs instance are failing to start and are killed until one finally successfully start. It is likely due to AWS ECS and not this action.
STOPPED (ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.us-west-2.amazonaws.com/": dial tcp 52.119.174.83:443: i/o timeout)
I had the same experience as @0Lucifer0 — would be nice if the github action could fail loudly or provide more insight into why the task isn't getting deployed correctly
Also hitting the same problem. Seems random - sometimes finishes in 2 mins, sometimes in 20. Deploying using aws-actions/amazon-ecs-deploy-task-definition@v1 as everyone else here.
This is a major, major pain and eats up billable minutes in github actions. Please take a look, AWS.
I can confirm that the hanging is during the stability check. Not a fan of turning this off, but in order to save on billable minutes I am. You can use other automated health checks to check on the service and keep this action limited to build a deploy.
Excerpt of debug info:
@aencalado 's solution worked for me to turn off wait-for-service-stability
The solution given by @grommir worked for me.
@ddaniel27 can you give more details on how to achieve this ? how/where did you change the deregistration_delay property ?
@0Lucifer0 If you are in the AWS console, go to EC2 > target groups > your target group > attributes > Edit. There you just must to change the 300 to something like 5 seconds and that's all. I don't know how this can affect something else in the ECS behavior but for this issue, it works.
I guess that won't work for me as for some reason there is no target groups 😢
I'm having the same issue with deregistration_delay
of 10 sec
I can confirm this is still an issue. When a deployment is failing, it seems AWS does not answer anything which leads to an extremely long deployment time that ultimately ends up in timeout.
For what is worth, my long deployment time was due to using the wrong VPC somewhere which was causing a delay and then a timeout.
Em ter., 9 de mai. de 2023 às 10:12, Nicolas Cordier-Cerezo < @.***> escreveu:
I can confirm this is still an issue. When a deployment is failing, it seems AWS does not answer anything which leads to an extremely long deployment time that ultimately ends up in timeout.
— Reply to this email directly, view it on GitHub https://github.com/aws-actions/amazon-ecs-deploy-task-definition/issues/102#issuecomment-1540348250, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJUA5WZFG3X2R3Z46IBK73XFJNF7ANCNFSM4QHITRQQ . You are receiving this because you commented.Message ID: <aws-actions/amazon-ecs-deploy-task-definition/issues/102/1540348250@ github.com>
Hes the issue still here, deployment failed, action hanged out
Guys because it is not just deploying the task definition, it is also waiting for service stability. Service stability is # of required tasks == number of active tasks + Health checks passing. If it takes 30 minutes (default) then it's because it was waiting 30 minutes for healthchecks to pass and they didn't. You can set a timeout to a lower value than 30 minutes.
Guys because it is not just deploying the task definition, it is also waiting for service stability. Service stability is # of required tasks == number of active tasks + Health checks passing. If it takes 30 minutes (default) then it's because it was waiting 30 minutes for healthchecks to pass and they didn't. You can set a timeout to a lower value than 30 minutes.
The issue is not really that it timeout after 30min it is that is take a long time (25min is quite long even if it does not timeout) to successfully start in some cases. Sadly the error seems more like an underlying issue with AWS than with that specific github action
https://github.com/aws-actions/amazon-ecs-deploy-task-definition/issues/102#issuecomment-1266750550
Anybody figure a solution for this? Facing this frequently once in a while and is very annoying.
We have a staging environment where we have set desired count to 1. When deployment gets stuck, we face service downtime.
I can see the task logs and see service has started by for some reason ecs deployment still shows deploying and ELB won't route to the new task instance.
Same here :(
I looked into the configuration of the wait timer and my thoughts are as follows:
wait-for-minutes
which equates to maxWaitTime
const WAIT_DEFAULT_DELAY_SEC = 15;
maxDelay
maxDelay
)As a result, stability is checked early and will almost always default to 120 seconds since early attempts will fail.
That means once stability is reached (depending on your AWS settings of the target group / ecs health check delays etc) an additional 120 seconds seems to be tacked on top of the stability.
Solutions to fix:
maxDelay
and minDelay
as available options for stability checking
wait-for-min-delay-seconds
wait-for-max-delay-seconds
I faced the same issue. Usually, it takes about 4-5 mins.
Same issue with python 3.11 poetry environment.
I found that if you disable the task stability check then it takes only a few seconds/minutes to deploy
- name: Deploy Amazon ECS task definition uses: aws-actions/amazon-ecs-deploy-task-definition@de0132cf8cdedb79975c6d42b77eb7ea193cf28e with: task-definition: ${{ steps.task-def.outputs.task-definition }} service: ${{ env.ECS_SERVICE }} cluster: ${{ env.ECS_CLUSTER }} wait-for-service-stability: false. # <--- default is true
this solved mine
How can I troubleshoot why the deploy is taking incredibly long all of a sudden?
my task definition