aws / amazon-ecs-agent

Amazon Elastic Container Service Agent
http://aws.amazon.com/ecs/
Apache License 2.0
2.07k stars 605 forks source link

Nodes periodically disconnect to the ecs agents #219

Closed nickyhof closed 8 years ago

nickyhof commented 8 years ago

We have many ecs instances that seem to disconnect to the ecs agent. This causes us problems when redeploying containers, determining task status, etc. In the web console we see under the "ECS Instances" tab, that a few instances say "Agent Connected" false. And they don't seem to re-connect. What could be the cause of this? Is this a known issue?

samuelkarp commented 8 years ago

Brief, periodic disconnections are expected. However, the Agent should reconnect quickly after any disconnection. If you're seeing the Agent stay disconnected for extended periods of time, I'd be very interested in seeing the logs (especially with ECS_LOGLEVEL=debug).

samuelkarp commented 8 years ago

@nickyhof I'm going to close this for now, but if you have the chance to grab logs I'd definitely like to take a look at them.

buley commented 8 years ago

I'm trying to replatform us onto ECS and getting disconnects all the time. They increase when I deploy frequently. When I don't catch them, they can take out all containers. Is there any way to detect an ECS disconnection so I can add a new instance when this happens?

screen shot 2015-12-10 at 12 55 02 pm

nickyhof commented 8 years ago

@buley after upgrading our ec2 agents to the latest (1.6.0), we haven't experienced anymore disconnections

nikhilo commented 8 years ago

We noticed this after performing a few deployments back to back. ECS agent v1.8.0 Running on custom Centos-7.2

A temporary workaround is to restart ECS agent before every deployment. But I don't think that's ideal.

ChrisRut commented 8 years ago

@nikhilo , you are likely experiencing this regression in v1.8.0 of the agent: https://github.com/aws/amazon-ecs-agent/issues/313