docker / cli

The Docker CLI
Apache License 2.0
4.73k stars 1.88k forks source link

CLI hangs when creating a failing replicated-job with `--restart-condition=none` #2979

Open ben-davis opened 3 years ago

ben-davis commented 3 years ago

Description When creating a replicated-job service, if the task fails with --restart-condition=none, when TotalCompletions has been reached the CLI hangs. As neither update_config or rollback_config can be set, it doesn't seem possible to control what Docker should do.

I'm using a replicated-job to run a migration during a deploy pipeline and ideally when all tasks have failed to converge, the docker service create|update would exit with an error.

Steps to reproduce the issue:

  1. docker service create --mode=replicated-job --replicas=1 --restart-condition=none bash "exit 1"

Describe the results you received: The CLI command hangs indefinitely.

Describe the results you expected: The CLI command to exit as it's reached its limit to attempt any more restarts of the tasks.

Output of docker version:

Docker version 20.10.3, build 48d30b5

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)

Server:
 Containers: 74
  Running: 6
  Paused: 0
  Stopped: 68
 Images: 7
 Server Version: 20.10.3
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: s6zwbsect4msdzs0fwv04nnq3
  Is Manager: true
  ClusterID: 570e06kxskw39x8d4fxetxpmc
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 143.198.18.91
  Manager Addresses:
   143.198.18.91:2377
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.4.0-193-generic
 Operating System: Ubuntu 16.04.7 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 992.1MiB
 Name: adventure-text-01
 ID: 2KSA:3PPX:RJOD:YTJ6:AYLM:3NVB:QQXA:CIQN:ER3M:PTKX:OSFT:B7ED
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
  provider=digitalocean
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support
kevinsmith commented 1 year ago

We're running into the same problem. Given that this seems like it's the only proper way to run DB migrations, it's surprising to see that no one else has chimed in after a year and a half.

@ben-davis Did you ever land on a solution or workaround?

nblazincic commented 1 year ago

I also have a problem with db-migrations, unfortunally you can not be sure that swarm will execute only one task. see: https://github.com/moby/moby/issues/42789

https://github.com/moby/moby/issues/42741#issuecomment-1229449280

https://github.com/moby/moby/issues/42742#issuecomment-1229449246

Vanav commented 1 year ago

I can confirm CLI hang in Docker 24:

# docker service create --mode=replicated-job --restart-max-attempts=0 --replicas=1 --restart-condition=none --name=test bash "exit 1"

kj6pwo0rverbvq2plz9mt5ef4
job progress: 0 out of 1 complete [>                                                  ] 
active tasks: 0 out of 1 tasks 
1/1: task: non-zero exit (127) 
...
dazinator commented 10 months ago

Can confirm the same issue

Docker version 24.0.5, build ced0996

wpdmitry commented 7 months ago

We're running into the same problem. Given that this seems like it's the only proper way to run DB migrations, it's surprising to see that no one else has chimed in after a year and a half.

@ben-davis Did you ever land on a solution or workaround?

I also have the same problem. I use command timeout or analogs for workaround:

timeout 60s docker service create [OPTIONS] IMAGE [COMMAND] [ARG...]