docker / cli

The Docker CLI
Apache License 2.0
4.88k stars 1.91k forks source link

plugin sigterm handling behavior is odd #4402

Open nicks opened 1 year ago

nicks commented 1 year ago

Description

Docker plugins have some really odd behavior when the main process gets a SIGTERM signal:

ERRO[0013] got 3 SIGTERM/SIGINTs, forcing shutdown

Reproduce

Example code: https://github.com/nicks/docker-sleep/blob/31f7ba2ffd0d48f7c1ba99ef4def3bd1c2429bcb/main.go#L1

1) Install my sample plugin, docker sleep

git clone git@github.com:nicks/docker-sleep
cd docker-sleep
go build -o ~/.docker/cli-plugins/docker-sleep .
docker sleep

2) Run docker sleep

3) In a separate terminal, run kill 3 times on the main docker process

Expected behavior

The ideal behavior would be for the main docker cli process to forward the SIGTERM to the plugin, then exit when the plugin exits. But I think other behaviors are arguable?

It seems bizarre that it ignores the first signal, swallows it without telling the plugin, then dies on the third signal.

the error message is also incorrect - it doesn't shutdown anything, it just leaves the subprocess running in the background

docker version

Client: Docker Engine - Community
 Cloud integration: v1.0.35
 Version:           24.0.2
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        cb74dfc
 Built:             Thu May 25 21:51:00 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.2
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       659604f
  Built:            Thu May 25 21:51:00 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    24.0.2
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.0-cloud-driver+002
    Path:     /home/nick/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.19.0-cloud
    Path:     /home/nick/.docker/cli-plugins/docker-compose
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.0
    Path:     /usr/lib/docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.20
    Path:     /usr/lib/docker/cli-plugins/docker-extension
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v0.1.0-beta.6
    Path:     /usr/lib/docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /usr/lib/docker/cli-plugins/docker-sbom
  scan: Docker Scan (Docker Inc.)
    Version:  v0.26.0
    Path:     /usr/lib/docker/cli-plugins/docker-scan
  scout: Command line tool for Docker Scout (Docker Inc.)
    Version:  0.16.1
    Path:     /usr/lib/docker/cli-plugins/docker-scout
  sleep:  (Docker Inc.)
    Version:  0.1.0
    Path:     /home/nick/.docker/cli-plugins/docker-sleep
WARNING: Plugin "/usr/lib/docker/cli-plugins/docker-compose.14.backup" is not valid: plugin candidate "compose.14.backup" did not match "^[a-z][a-z0-9]*$"

Server:
 Containers: 26
  Running: 4
  Paused: 0
  Stopped: 22
 Images: 57
 Server Version: 24.0.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.15.0-75-generic
 Operating System: Linux Mint 21.1
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 15.25GiB
 Name: grumpy
 ID: 9cba2c25-6ba4-4f6c-995c-53aacc5cc77e
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional Info

related issue: https://github.com/docker/cli/issues/4332

related pr: https://github.com/docker/cli/pull/2799

cc @neersighted (this came from a slack convo i'm having with them)

nicks commented 1 year ago

(as a side note, i was able to reproduce the same behavior with both the Cloud Integration cli and the vanilla com.docker.cli one)

thaJeztah commented 1 year ago
thaJeztah commented 1 year ago

I just ran into a docker buildx imagetools inspect hanging, and had to triple-signal to kill the CLI;

docker buildx imagetools inspect nginx

^C^CERRO[0113] got 3 SIGTERM/SIGINTs, forcing shutdown

I went looking for the "SIGTERM/SIGINTs, forcing shutdown", error, and it looks like that is implemented in the "appcontext" utility in BuildKit; https://github.com/docker/cli/blob/dc2eb3bf7c835ff820e0a38de123a846b9e887e9/vendor/github.com/moby/buildkit/util/appcontext/appcontext.go#L33-L43

Originally, that code was pulled into the CLI for BuildKit builds only; https://github.com/docker/cli/commit/656fe85c740110eb726771264ba5b3f29430e97c

So, I guess the intention was for the CLI to forward signals to the plugin, and wait for the plugin to shutdown, but providing a "force stop" of the CLI (itself?) in situations where the plugin refused to shut down. I would expect in that case for the plugin to be force-killed though (and not kept running).

It's possible some "unrelated" change broke that signal forwarding, or maybe it never was there, because plugins were of course added later, and it's possible that existing code only considered forwarding signals to a container (if attached), but not to processes running locally (i.e., "plugins").

jahudka commented 1 year ago

Hey folks, coming over from docker/compose#10898. I think that when the CLI receives a signal which is customarily used to ask processes to gracefully terminate, the CLI should oblige; but it's been pointed out that if the CLI were to forward signals to its children (which would be the easiest solution), then plugins would receive two signals in cases where the signal is sent to the entire process group (such as when pressing Ctrl+C in the terminal) - which would break the existing and, as I understand it, desirable behaviour of plugins (shutdown gracefully on first signal, shutdown fast on second).

So, trying to come up with another solution which would achieve the desired behaviour without breaking existing functionality, I had another idea: if plugins were to terminate gracefully when their standard input is closed, then the CLI could use that to signal to the plugin that it should politely bow out. That is, the CLI's signal handler would basically be (in pseudo-code) something like:

def signal_handler($signal):
  uninstall signal_handler
  close plugin's STDIN
  wait for plugin to terminate
  send $signal to self

What do you think? I'm not an expert on POSIX, but as far as I understand, this is how one should cleanly react to a signal - uninstall the signal handler, perform any necessary shutdown and cleanup operations, and then self-kill using the same signal so that the appropriate termination signal is registered by the kernel (or something like that, the details are murky). Obviously the same probably wouldn't work on Windows; I don't have the faintest idea of how things work there.

AlexSkrypnyk commented 12 months ago

@thaJeztah We are using a lot of tools from inside of the containers when running locally and, since switching from Docker Compose 1, this has became a real pain.

Is there a way to prioritise this fix please? I'm happy to sponsor this work to have this resolved. Thank you

laurazard commented 8 months ago

fyi @jahudka @AlexSkrypnyk the above issue is fixed w/ the latest CLI and Compose releases, which should also get shipped in Desktop soon.