gliderlabs / docker-consul

Dockerized Consul
MIT License
1.07k stars 286 forks source link

leave-on-terminate not working on host shutdown #104

Open jondunning opened 9 years ago

jondunning commented 9 years ago

Hi,

Im using this docker image on multiple AWS EC2 instances. I'm using the following consul.json file to pass the parameters to the container.

{
    "leave_on_terminate": true,
    "recursors": [
        "10.1.2.2"
     ]
}

Docker run command below;

docker run -d --restart=always -p 8301:8301 -p 8301:8301/udp -p 8400:8400 -p 8500:8500 \ 
-p 53:53/udp -v /opt/consul:/data -v /var/run/docker.sock:/var/run/docker.sock \
-v /etc/consul:/etc/consul \
-h $(curl -s http://169.254.169.254/latest/meta-data/instance-id) \
--name consul-agent progrium/consul \
-advertise $(curl -s http://169.254.169.254/latest/meta-data/local-ipv4) \
-dc eu-west-1 -atlas={ATLAS_INFRASTRUCTURE} -atlas-join -atlas-token="{ATLAS_TOKEN}" \
-config-file /etc/consul/consul.json

If i do a "docker stop {CONTAINER_ID}" it sends the following and successfully removes the consul agent before termination.

==> Caught signal: terminated
==> Gracefully shutting down agent...
    2015/08/04 07:42:30 [INFO] consul: client starting leave
    2015/08/04 07:42:30 [INFO] serf: EventMemberLeave: i-99424134 10.1.2.64
    2015/08/04 07:42:31 [INFO] agent: requesting shutdown
    2015/08/04 07:42:31 [INFO] consul: shutting down client
    2015/08/04 07:42:31 [INFO] agent: shutdown complete

However, if i do a linux shutdown on the host machine the same information is logged but it doesn't remove the consul agent from the cluster.

==> Caught signal: terminated
==> Gracefully shutting down agent...
    2015/08/04 07:45:47 [INFO] consul: client starting leave
    2015/08/04 07:45:48 [INFO] serf: EventMemberLeave: i-99424134 10.1.2.64
    2015/08/04 07:45:48 [INFO] agent: requesting shutdown
    2015/08/04 07:45:48 [INFO] consul: shutting down client
    2015/08/04 07:45:48 [INFO] agent: shutdown complete

Not sure what could be causing this, maybe the network interface has been shutdown and the leave request doesn't get through to the cluster?

Any thoughts?

Cheers,

Jon

jondunning commented 9 years ago

Any thoughts on this issue?

progrium commented 9 years ago

Nope. If you could try running Consul on the host without a container and see if it behaves the same that would tell you if it's a Docker Consul issue or a Consul issue.

jondunning commented 9 years ago

Ran consul on a standard AWS Linux VM using upstart . Consul leaves cluster fine on terminate. So this looks to be a docker issue specifically.

As you can see from my comments above the consul agent inside the docker container does say it sends the "EventMemberLeave" command but for some reason it never makes it to the cluster.

jondunning commented 9 years ago

Anything else you would like me to try?

progrium commented 9 years ago

Hmm, okay well perhaps it's a Docker issue. Poke around open issues over there and see if you find anything.

mboudreau commented 9 years ago

Any updates on this? I'm getting the same issue and it's annoying :/

spanktar commented 8 years ago

Bump.