Stack deployments should not override current service scale

When re-deploying an existing stack in order to update services inside it (with docker stack deploy), the stack file's service.deploy.replicas value overrides the current scale of each service. For example, if I have a stack called my-stack with

services:
  my-service:
    deploy:
      replicas: 3

but because of increased load, at some point I scaled the service up to 9 replicas using:

docker service scale my-stack_my-service=9

then running the following command to, e.g., get new images will reduce the scale back to 3:

docker stack deploy -c my-stack.yml my-stack

The service.deploy.replicas key, in a stack/compose file, is unique because it represents a starting condition for a service in a swarm cluster rather than a desired condition. A major reason for deploying services or stacks to a swarm cluster is to be able to trivially scale them up or down in response to changing conditions, load, # users, etc., so running docker service scale is valid even with stack deployments.

Because it's a starting condition, docker stack deploy should not alter the current scale (or # replicas) of any services within the stack when re-deploying. It can easily regress a swarm cluster to not handling the current load (causing downtime and requiring manual re-scaling every time) and it can be destructive for services that are deployed with replicas: 0 because of some manual step in their deployment.

I understand where you're coming from, but I don't think this would be possible currently. The docker stack commands are (currently) fully implemented client-side, and other than some metadata on objects that were created (labels that are set on services, volumes, networks etc), no "state" is preserved, and as such, the compose-file itself is treated as the "source of truth" (the desired state of the stack). The docker stack deploy command is designed to either deploy or update the stack, and in all cases make the stack's definition match the definition that's described in the compose file.

Because of the above, when docker stack deploy is ran and (in your example) the number of replicas for the service doesn't match the number of replicas in the compose file, it doesn't know if that is because the compose file was updated, or if it was because the number of replicas was manually updated.

For example, it won't be able to distinguish the situation you described from a situation where you originally deployed your stack from this compose file:

version: "3.7"
services:
  my-service:
    image: nginx:alpine
    deploy:
      replicas: 3

Then, updated the compose file to have one replica, and re-deployed (docker stack deploy ..);

version: "3.7"
services:
  my-service:
    image: nginx:alpine
    deploy:
      replicas: 1

While assuming the number of current replicas is the desired number (i.e., assuming that was done manually, and ignoring what's in the compose file) would solve your use-case, it would be a breaking change users that want the compose file to be the source of truth. In addition; it would be difficult to draw a line (which options should be ignored, and which options should be updated? Perhaps someone manually updated the services memory-limit because it temporarily needed more?).

What would be a solution? I think in an ideal situation, the docker-compose file used to (re)reploy a stack would be preserved somewhere. Ideally that would be daemon-side (somewhere in the swarm or kubernetes cluster), so that the information is available irregardless of which client is used to re-deploy the stack, which could be from a different machine than it was originally deployed from.

Having that information preserved would allow docker stack deploy to check for differences between the current version of the compose file and the version that was last used to deploy (or update) the stack.

When updating the stack, only those options that changed would be updated (and any option that was manually updated could be left as-is).
Possibly adding a --force option to force the stack to be updated to match the compose file (effectively reverting any change that was done manually)

Without a server-side component for stacks, this could be a bit difficult, but if someone wants to experiment with the approach above, it could be implemented as a CLI plugin that stores a copy of the compose file in the project directory (e.g. ./.docker/docker-compose.last.yml). When re-reploying the stack, the diff between that file and the current file could be calculated, and used to update those properties that changed. Alternatively, the compose-file itself could be serialized into a string (could be a JSON string), and stored as a label on a service (?) - potentially risky if environment-variable substitution is used in the compose file (should those values be stored?)

Faced the same issue. My suggestion is to add a key to docker stack deploy which will allow ignoring replication change (--no-replication-change). Then docker stack deploy will know that it is update of stack item, not a re-scaling.

What I've done as a work around for the stack deploy

export COMPOSE_FILE=$1
export STACK=$2

`echo \`docker stack services $STACK --format='export SERVICE{{printf .Name}}={{with split .Replicas "/"}}{{printf "%s" ( index . 1 )}}{{end}}'\` | sed "s/$STACK//g"`

docker stack deploy --compose-file $COMPOSE_FILE $STACK

It's allowed to use variables in replicas. Script above will make env variable for every service as SERVICE_[NAME]=[replicas] which can be used as replicas: ${SERVICE_worker:-3} in compose file where 3 - initial amount of replicas, worker - name of the service

My team would also find it really helpful to support "keep the same replica count" for stack deployments. Our immediate use-case is to handle cronjobs run by crazymax/swarm-cronjob. They're initially deployed with replicas=0 and restart_policy.condition=none; swarm-cronjob later ticks up to a configurable positive number of replicas. If/when we redeploy, we don't want to mess with the replica count or kick off new runs.

I haven't looked into exactly how it's implemented, but Ansible's community.docker.docker_swarm_service module implements the requisite feature via replicas: -1. I think that's a viable option for us, and we're okay with two-stage deployments (i.e. first with replicas=0, then immediately after with replicas=-1 or whatever other magic value is needed).

@varelaz Thanks for that — I'd been tiptoeing around a similar workaround and found that very helpful! Note that it can be shortened a bit:

# Index 0 = current replicas; index 1 = desired replicas
docker stack services test --format 'SERVICE_{{.Name}}={{index (split .Replicas "/") 0}}'

That said I'm leery of adding the additional complexity. It means that whoever / whatever runs deployments effectively also needs to run the wrapper script before executing commands, which seems error prone to me.

docker / cli

Stack deployments should not override current service scale #2235