coreos / fleet

fleet ties together systemd and etcd into a distributed init system
Apache License 2.0
2.42k stars 302 forks source link

No support for "rolling upgrade" of global unit #960

Open bcwaldon opened 10 years ago

bcwaldon commented 10 years ago

When a global unit is launched in a cluster, all possible fleet agents start up a local instance, assuming the MachineMetadata condition is met. Now imagine that you want to upgrade that unit, maybe to launch a new version of a docker container, or just to tweak some of the systemd options. You have to destroy it, which causes the unit to stop on every node that was previously running it, and launch it all over again. This "full down, full up" can clearly be disruptive.

What can we do about this?

  1. Document approaches to large-scale rolling upgrades with fleet (specifically with global units)
  2. Provide built-in rolling upgrade semantics for global units
rosskukulinski commented 10 years ago

Documentation for large-scale rolling upgrades would be helpful (global AND non-global). We're experimenting with automatically generating unit files like myservice-@.service and setting the docker image tag to :. Curious what other people are doing.

xh3b4sd commented 9 years ago

+1 for rolling upgrades

rlister commented 9 years ago

We use global units running on instances in autoscaling groups, as determined by fleet metadata. Since the number of instances fluctuates, it is not feasible to use myservice@X, since values of X are not known.

It is therefore a major problem for us that a unit update requires downtime on the entire group.

I feel as though a reasonable intermediate step would be for updates to [System] to, optionally, cause just an update of unit files on each instance, and we can then trigger systemctl restart as necessary in a rolling fashion. Perhaps in the future fleet can also handle that as well.

Changes to [X-Fleet] will have to happen in a different manner, probably with immediate effect.

scatterbrain commented 9 years ago

We're facing a similar issue where we use global units in autoscaling groups and need a way to issue rolling upgrades.

cdwertmann commented 9 years ago

+1 for this feature. We cannot restart our global units without downtime at the current state.

xh3b4sd commented 9 years ago

We use a workaround for this. I am sure it is not sufficient in all cases but you can also scale global units and run multiple sets of them at the same time. So when updating one slice, the other ones are still there. That works e.g. with haproxy quiet well, because it can share the same ports.

akaspin commented 9 years ago

We using vulture for upgrades. It works only with units with docker containers.

cdwertmann commented 9 years ago

@akaspin Could you improve the docs for vulture a bit? It's difficult to figure out how to set it up and how to use it. Does it require SkyDNS?

cdwertmann commented 9 years ago

@zyndiecate I followed your advice and split my application into two global services, which I can upgrade independently without causing downtime for the entire app. Still I think it would be fantastic if CoreOS/systemd had built-in support for such a common operation as rolling upgrades.

xh3b4sd commented 9 years ago

@zyndiecate I followed your advice and split my application into two global services, which I can upgrade independently without causing downtime for the entire app.

Thanks for letting me know. I am glad it worked out for you.

Still I think it would be fantastic if CoreOS/systemd had built-in support for such a common operation as rolling upgrades.

Not sure. AFAIK this is not going to happen anyway, but lets see.

shadoi commented 9 years ago

I have been wanting to ability to "symlink" records in etcd. I generally hold the etcd hierarchy as a filesystem in my mind and so having this capability seems like a natural fit. This would make it very simple to set groups of units in etcd as "in production" or "upgrade candidate", without modifying their current state.

I suppose this could be implemented currently by just copying records, or making collections of pointer objects in a separate hierarchy, but that seems pretty kludgy and messy to keep synchronized. I'm not sure how feasible it is to natively support symbolic links in etcd, but I think it might make for a clean implementation here.

albertsun commented 8 years ago

Hoping for this feature as well. Without it, is there any way at all to update the systemd options passed to a unit without stopping all of them and taking the service down?

xh3b4sd commented 8 years ago

I only know this https://www.freedesktop.org/software/systemd/man/systemctl.html#set-property%20NAME%20ASSIGNMENT...

johan-adriaans commented 8 years ago

For docker image updates I use systemctl over ssh to do a rolling update. It feels a bit wrong to restart a fleetd service using systemd but it does work. I'm not sure about the implications of doing this. Are there any fleet admins that can tell me if this approach is right or [very] wrong?

It uses a grep string to match units, connects to the server using ssh and restarts the unit using systemctl. For this to work you would need public/private key based ssh access. If you run this on the cluster itself you will need to run and forward your ssh-agent.

My script:

#!/bin/sh

if [ "$#" -ne 1 ]; then
  echo "Usage: $0 [regex] restart a list of units one by one" >&2
  exit 1
fi

regex=$@
result=$(fleetctl list-units | grep "$regex" | awk '{split($2,a,"/"); print $1 ":" a[2]}')
is_running=-1

if [ -z "$result" ]; then
  echo "Could not find any units matching $unit" >&2
  exit 1
fi

echo "This command will restart the following units:";
for word in $result; do
  echo $word
done
printf "Are you sure? Type y to continue: "
read answer
if [ $answer != "y" ]; then
  echo "User quit"
  exit
fi

for line in $result; do
  unit=$(echo $line | cut -d':' -f1)
  machine=$(echo $line | cut -d':' -f2)

  if [ $is_running -ne -1 ]; then
    printf "waiting 5 seconds before stopping next unit\n"
    sleep 5
  fi

  printf "stopping %s:%s\n" $unit $machine
  ssh -o "StrictHostKeyChecking=no" $machine "sudo systemctl stop $unit"

  printf "waiting for %s:%s to stop " $unit $machine;
  is_running=1
  while [ $is_running -ne 0 ]; do
    is_running=`fleetctl list-units | grep $unit | grep running | grep $machine | wc -l`;
    sleep 1;
    printf ".";
  done
  printf "\n"

  printf "starting %s:%s\n" $unit $machine
  ssh -o "StrictHostKeyChecking=no" $machine "sudo systemctl start $unit"

  printf "waiting for %s:%s to start " $unit $machine;
  while [ $is_running -eq 0 ]; do
    is_running=`fleetctl list-units | grep $unit | grep running | grep $machine | wc -l`;
    sleep 1;
    printf ".";
  done
  printf "\n"
done
simonvanderveldt commented 8 years ago

We use a workaround for this. I am sure it is not sufficient in all cases but you can also scale global units and run multiple sets of them at the same time. So when updating one slice, the other ones are still there. That works e.g. with haproxy quiet well, because it can share the same ports.

@xh3b4sd How did you do this? Are you running multiple HAProxy instances that are listening on the same port on the host?

xh3b4sd commented 8 years ago

@simonvanderveldt correct. When HAProxy runs in Docker containers you will need --net=host. All bind to the same port. HAProxy will sort out the rest. As already mentioned. When you have two sets of global units, e.g. haproxy-1.service and haproxy-2.service, you can upgrade one global set without interrupting the other.