coreos / fleet

fleet ties together systemd and etcd into a distributed init system
Apache License 2.0
2.42k stars 302 forks source link

Proposal: add ExecUnload directive to unit file #1323

Open acbodine opened 9 years ago

acbodine commented 9 years ago

Would it be a bad idea to add ExecUnload directive in a unit file, so that a user can specify cleanup actions for when the unit is un-scheduled from the cluster?

I have searched around for someone else that desires the same behavior: https://github.com/coreos/fleet/issues/612

The suggested unit file directive combos (ExecStartPre, ExecStart, ExecStop) to handle cleaning up unit artifacts (maybe Docker containers) really don't solve the issue of when I fleetctl stop && fleetctl unload a unit I still have a Docker container (hopefully stopped) on the host that the unit was running on.

For example:

Let's say we want to have multiple etcd instances running on a single host. We don't have to do this in Docker containers, but for the sake of example let's say we do, such that:

$ docker images | grep etcd
etcd          2.1.1          d90a4ab8affa        7 days ago          179.1 MB

And we have a unit file etcd-container@.service like so:

[Unit]
Description=Etcd Cluster
After=docker.service
Requires=docker.service

[Service]
Restart=always
RestartSec=50s
TimeoutStartSec=0
EnvironmentFile=/etc/sysconfig/etcd
ExecStart=/bin/bash -c "\
        docker start %p_%i && docker logs -f %p_%i || \
        eval $(etcdctl --peers=${IP}:2379 member add %p_%i http://${IP}:$PEER_PORT | tail -n 3 | tr -d '\"' ) ; \
        docker run -p 2379 -p $PEER_PORT:$PEER_PORT --name %p_%i \
            -e ETCD_ADVERTISE_CLIENT_URLS=\"http://${IP}:${CLIENT_PORT}\" \
            -e ETCD_INITIAL_ADVERTISE_PEER_URLS=\"http://${IP}:$PEER_PORT\" \
            -e ETCD_INITIAL_CLUSTER=\"$ETCD_INITIAL_CLUSTER\" \
            -e ETCD_INITIAL_CLUSTER_STATE=\"$ETCD_INITIAL_CLUSTER_STATE\" \
            -e ETCD_NAME=$ETCD_NAME \
            -e ETCD_LISTEN_PEER_URLS=\"http://0.0.0.0:$PEER_PORT\" \
            -e ETCD_LISTEN_CLIENT_URLS=\"http://0.0.0.0:2379\" \
            etcd:2.1.1"
ExecStop=/usr/bin/docker stop %p_%i

[X-Fleet]
MachineID=%i

As you can see when we fleetctl start etcd-container@$(cat /etc/machine-id).service, first we try to start an existing etcd container that matches the %p_%i on the host which should allow a stopped/running etcd container to start talking to the etcd cluster if it exists. Otherwise add a new member to the etcd cluster, and start a new Docker container with the relevant port binding information.

Now please entertain the use case where we have an existing etcd cluster that is deployed across multiple hosts with the above unit file. If we need to re-deploy that cluster completely from scratch (maybe because etcd is in a bad state), I would think that running fleetctl stop && fleetctl unload for each etcd unit would remove the etcd cluster and it's artifacts (i.e. Docker containers that contain etcd snapshots) Then simply just fleetctl load && fleetctl start a new etcd cluster.

In practice this won't work without removing all of the Docker etcd containers across the fleet machines before re-deploying the etcd cluster becuase without doing that, the new units will simply start the existing etcd containers from the previous cluster. Whereas if I could tell fleet that when unloading an etcd-container@.service unit that I wanted it to remove the Docker container, it seems that the exercise of re-deploying the etcd cluster could be totally managed by fleet.

01101101 commented 9 years ago

+1

kaungst commented 9 years ago

+1