Feedback about global rescheduling

schmunk42 commented 8 years ago

I heard about the new rescheduling feature from @mgoelzer

Since this this is long awaited from my side I gave it a try today.

Maybe the first thing after starting some containers on a new swarm, after I found the docs here at https://github.com/docker/swarm/blob/master/experimental/rescheduling.md, is that you should add a link to https://github.com/docker/swarm/tree/master/experimental#enable-experimental-features

My next question would be, if I have to enable this on the master or on the agent or both or doesn't it matter on which one I enable it? Can I enable it per node?

About the syntax -experimental - we need to put this in front of the the swarm command - always, right? I would have expected swarm manage --x-rescheduling, like docker-compose --x-networking but that's not a big deal.

But, more important would be to set this flag from docker-machine. Is this somehow possible at the moment? Or can this be enabled after swarm is running, since it's a bit cumbersome to rebuild the swarm manage command.

schmunk42 commented 8 years ago

Some more feedback...

I tried to use it via docker-compose

appcli:
  environment:
    - 'reschedule:on-node-failure'

But got the error:

ERROR: invalid reschedule policy: on-node-failure=

I also tried

appcli:
  labels:
    - 'com.docker.swarm.reschedule-policy=["on-node-failure"]'

What basically works is:

appcli:
  labels:
    com.docker.swarm.reschedule-policy: "on-node-failure"

I couldn't see rescheduling yet, because I ran into

time="2016-01-18T22:53:30Z" level=error msg="HTTP error: Engine no longer exists" status=500

I just stopped and started the engine, don't know why this happens (yet)?!

schmunk42 commented 8 years ago

Got it working in docker-compose like so:

  labels:
    com.docker.swarm.reschedule-policies: "[\"on-node-failure\"]"

The syntax looks a bit quirky.

One more thing is that swarm fails to reschedule containers with volumes_from

time="2016-01-19T07:31:36Z" level=error msg="Failed to reschedule container f232deb5d1316b1ec347539b5124bd6a275c11980053fd07bbd584f427574a0f (Swarm ID: 2bc0c0732835775a81ce3bf55d603b3a4240b5b45ddde674b5fadae1901780df): unable to find a node that satisfies container==cafb011e356e7955bf9e07b1373d2b20c04eb21b2307c04cd29e1b77d3c0cc0d" 
time="2016-01-19T07:31:36Z" level=error msg="Failed to reschedule container 452caca64d10af7efd464a86641f9840a588407ad5989e79e3d2d0152cb6416a (Swarm ID: 94368fb15ad043b94231700f12260568a0306c9d5f992735aef11c33a378a9bc): unable to find a node that satisfies container==a542008945b11e4c973a07010c7ddd2f52497d7c713e624e42b5ed02ad724b7d"

Looks like it sets up a reference to the original container based on volumes_from, which does not really makes sense, since it's very likely that both containers were on the same host.

schmunk42 commented 8 years ago

After playing around a bit more, I think the rescheduling problems may also occur because of links. I know that they are deprecated, but it would be interesting to get a statement about support for rescheduling for docker-compose stacks, concerning links, volumes, volumes_from, networking etc...

A few weeks ago, I played around with rescheduling for docker-compose and ended up in basically kill && rm -fvthe stack and up it again, since I often ran into problems with constraints which seemed to be related to unreachable containers on failed nodes, which were still taken in consideration by swarm.

vieux commented 8 years ago

@schmunk42 as you said, links are deprecated, they were replaced by the new networking model where containers can talk to each other whether they are on the same host or now.

Regarding volumes_from you are right, we should improve the rescheduling to take this into account (reschedule containers on a particular order)

schmunk42 commented 8 years ago

reschedule containers on a particular order

I changed my stack for testing to use neither links nor volumes... with a docker-compose setup.

I noticed that when I was working without links, I had much lesser control over the startup process, since with links there was kind of ordered startup of several containers.

This brings problems with i.e. nginx, with needs to detect now, when PHP-FPM is fully up and running.

But I not 100% sure about this, could also be a side effect from other things while testing.

schmunk42 commented 8 years ago

PS :smile: The rescheduling of single containers works without problems so far - that's great :+1:

MBuffenoir commented 8 years ago

Is there a way to create a swarm with experimental feature on using docker-machine yet ? tried --swarm-opt but it does not seems to be made for that purpose :-/

schmunk42 commented 8 years ago

@MBuffenoir Hack ahead https://github.com/schmunk42/machine/commit/b741f603c70c306b068cbf1591b5436f5718e588 - I built from this revision, which was pretty easy since it runs in a Docker container :)

MBuffenoir commented 8 years ago

True surprisingly easy to build ... Sadly the driver is use is not stable in this fork :-( can't control the created swarm. I will wait until something more official is out to try rescheduling, as this is a mandatory feature for sure !

jpetazzo commented 8 years ago

@MBuffenoir: I use --swarm-image jpetazzo/swarm:experimental (it is a 2-line Dockerfile with FROM swarm:latest and ENTRYPOINT ["/swarm", "--experimental"]; it works like a charm!)

@vieux: I played with the rescheduling, and I like it! I have one problem though; when a failed node recovers, there is a name clash. I opened #1810 to discuss it. But other than that, I like it! :+1:

wallnerryan commented 8 years ago

@jpetazzo @schmunk42 in your journeys with compose, swarm and rescheduling have this been proven to work yet? Working with master today and can't seem to get this to work with compose 1.6.2 using reschedule: "on-node-failure"

will try one container examples tomorrow

schmunk42 commented 8 years ago

@wallnerryan For very simple (actually single container) stacks it was working for me.

More complex compose setups were not really working for me, eg. with volumes_from.

As an example: Let's say you've a container A which has a data-volume and container B which uses volumes from A. When A dies, B may die also, swarm tries to reschedule A and B, but B is still "linked" (via volumes_from) to the id of old A.

I think you should avoid links and volumes, when playing with rescheduling.

btw: Do you use v2 syntax?

wallnerryan commented 8 years ago

Hmm okay, am using links. So I can try to eliminate that.

Yes I am using v2.

Btw how are you testing the failover, (I'm just doing a 'shutdown -h now' for failing a node ) I didn't see any log output todo with rescheduling just the nose was de registered and unhealthy in the manager logs, but wondering because I'm using a more complex multi service v2 compose file that it may not be working correctly.

schmunk42 commented 8 years ago

Btw how are you testing the failover

Run docker events on your swarm.

schmunk42 commented 8 years ago

Links are marked deprecated and will be removed in 1.12 AFAIR. With v2-sytnax I was able to remove links, which I needed for networking.

jpetazzo commented 8 years ago

I couldn't get rescheduling to work reliably so far. I'm testing with a stand-alone nginx container (no links, no nothing) and I test by rebooting the node hosting it. It never gets rescheduled correctly (I almost always end up with 2 new containers instead, and then they fail to start). I test on 5 nodes cluster where each node is a Swarm manager.

wallnerryan commented 8 years ago

Thanks for the info @jpetazzo and @schmunk42. Removing links isnt an issue i just didnt set up networking yet when i tested. Will try a few things out and at least I know where I may run into issues. @schmunk42 did you only get reschedule with swarm to work with labels

 labels:
    com.docker.swarm.reschedule-policies: "[\"on-node-failure\"]"

or were you able to get it working with

environment:
   reschedule: "on-node-failure"

schmunk42 commented 8 years ago

Last time I tried, I remember only labels with the "bit quirky" syntax were working, but compose 1.6 (stable) wasn't released back then.

wallnerryan commented 8 years ago

ok, so running docker run -d --restart=always -e reschedule:on-node-failure redis then doing a shutdown -h now on one node I see the container get rescheduled and Created but not started.

From logs i see it go down then i see the create

016-03-08T16:07:18.253286247Z container create 8c980b35a6d5a16e4804457e8d273006fc07b196d40d0e08a6affe87a6558c05 (com.docker.swarm.id=8b5354770bc62bfd2be7b3998df4a4db26bbf6444d15c3a607b770d586410b5a, com.docker.swarm.reschedule-policies=["on-node-failure"], image=redis, name=suspicious_varahamihira, node.addr=10.0.195.84:2375, node.id=DWJS:ES2T:EH6C:TLMP:VQMU:4UHP:IBEX:WTVE:VTFO:E5IZ:UBVJ:ITWW, node.ip=10.0.195.84, node.name=ip-10-0-195-84)

Then i see the container but not started

CONTAINER ID        IMAGE   COMMAND                  CREATED              STATUS              PORTS                                  NAMES
8c980b35a6d5        redis    "/entrypoint.sh redis"   About a minute ago                                                              ip-10-0-195-84/suspicious_varahamihira
db6e224a2ebd        swarm    "swarm --experimental"   5 minutes ago        Host Down

i presume if i restarted the node instead of halted i would have 2 containers as @jpetazzo was seeing, which seems related to https://github.com/docker/swarm/issues/1846

mrapczynski commented 8 years ago

Just chiming in that I am also getting the same issue reported if I try to specify a rescheduling policy as an environment variable in a Compose file. Here is the raw errors output from our CI:

22-Mar-2016 11:02:48    Creating bsisbtd0job1_bsis_development_1
22-Mar-2016 11:02:48    invalid reschedule policy: on-node-failure=
22-Mar-2016 11:02:48    The "bsis_development" service specifies a port on the host. If multiple containers for this service are created on a single host, the port will clash.
22-Mar-2016 11:02:48    Creating and starting 1 ... 
22-Mar-2016 11:02:48    Creating and starting 2 ... 
22-Mar-2016 11:02:48    
22-Mar-2016 11:02:48    Creating and starting 1 ... error
22-Mar-2016 11:02:48    
22-Mar-2016 11:02:48    Creating and starting 2 ... error
22-Mar-2016 11:02:48    
22-Mar-2016 11:02:48    ERROR: for 1  invalid reschedule policy: on-node-failure= 
22-Mar-2016 11:02:48    ERROR: for 2  invalid reschedule policy: on-node-failure=

This is with Swarm 1.1.3, experimental turned on, and Compose 1.6.0

vieux commented 8 years ago

@mrapczynski ERROR: for 1 invalid reschedule policy: on-node-failure= can you try with compose 1.6.2 and if it still doesn't work paste your compose file here ?

mrapczynski commented 8 years ago

@vieux OK upgrading to Compose 1.6.2 seems to have done it. Thanks.

jpetazzo commented 8 years ago

My problem was linked to issue #1810 (bad interaction between Swarm manager failover and container rescheduling). Now that this one is fixed, I have the following feedback.

Network partitions

In case of network partition, containers are rescheduled correctly (cool!) but when the partition resolves, you end with multiple copies of the container. This is not a big problem, except that it makes the container name unresolvable because of the conflict.

Here are a few ideas to work around this:

delete the old container (risky!)
rename the old container
put a label on the old or new container (e.g. com.docker.swarm.rescheduled on the old container)

Meanwhile, the user can also work around the issue as well by using docker ps --filter name=highlander to check if there are duplicates.

Engine shutdown

When shutting down the Docker engine, containers are stopped, and this interferes with rescheduling: containers are rescheduled, but not started. I'm considering the following scenarios:

node is rebooted by mistake
node is rebooted due to underlying infrastructure maintenance

If I have an "important" container, I don't see how to reliably make sure that it will stay running somewhere else, other than maintaining multiple containers watching over each other.

Maybe, when a container is rescheduled, the restart policy should be honored? I think this might be what @wallnerryan suggested above. I just tried with Swarm 1.2.0-rc1, and stopped containers are rescheduled, but not started, even if they have a restart policy.

Other idea: maybe there could be a --restart=reschedule or something like that, specifically for this scenario?

wallnerryan commented 8 years ago

@jpetazzo this is what I was hoping with --restart=always as something such as terminating a VM on AWS still "shuts down" the node, so containers do not come back up Started

(with overlay networking) also, sending a halt -f to a node causes the container to be rescheduled but not started, except in this case I cannot call docker start or docker restart on the Created container without the below error.

Error response from daemon: 500 Internal Server Error: service endpoint with name <container-name> already exists

(without multi-host networking) running halt -f rescheduled and starts the container as expected

cnoffsin commented 8 years ago

This suggestion here is a good one:

Other idea: maybe there could be a --restart=reschedule or something like that, specifically for this scenario?

I had been messing with the rescheduling and thought it wasn't working, turns out because I was gracefully shutting down the node the container would get rescheduled in a "created" state and not start even though I had tried the restart always flag.

In real HA terms, we would want the option to get fired up no matter what.

allamand commented 8 years ago

On my side it looks like that swarm can't recreate a container if it was using overlay networks because the name already exists in the network (docker network inspect my_net shows the previously container) it seams that my swarm backend (consul) don't get info of missing container..

I have similar issues with external volumes, when reschedule, the external volume is still attached to the previous container and swarm refuse to launch new container using that volumes (I uses rexray witn openstack volumes)

wallnerryan commented 8 years ago

@sebmoule ditto on the network issue. I've seen the reschedule not work due to name existing in overlay. I think there was an issue or comment that this would need to be handled by libnetowork cleanup. @mavenugo might know.

With volumes i havent seen this issue, though I've tested the rescheduling with flocker. An asciinema of it working here. https://asciinema.org/a/44008?t=18:44

allamand commented 8 years ago

Hi all,

Any news on the reshedulile not work due to name already existing in overlay networks ?

@wallnerryan for the volume, my problem comes from my storage (openstack cinder) not allowing force dettaching a volume attached on the host that goes done, so this problem is out of scope here.

schmunk42 commented 8 years ago

I just tried rescheduling with swarm 1.2.5 and docker 1.12.1 - still no chance to get a stack rescheduled if services have a volumes_from definition.

hutchic commented 8 years ago

Unable to reschedule a container with/without a volume with/without an overlay network.

Client:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.3
 Git commit:   b9f10c9/1.11.2
 Built:        
 OS/Arch:      linux/amd64

Server:
 Version:      swarm/1.2.5
 API version:  1.22
 Go version:   go1.5.4
 Git commit:   27968ed
 Built:        Thu Aug 18 23:10:29 UTC 2016
 OS/Arch:      linux/amd64
 Experimental: true

using etcd cluster as the K/V store backend. The container gets created but never started

docker-archive / classicswarm

Feedback about global rescheduling #1651