docker / compose

Define and run multi-container applications with Docker
https://docs.docker.com/compose/
Apache License 2.0
33.59k stars 5.19k forks source link

[BUG] docker-compose does not close unix socket with dockerd after container exits and restarts #12023

Closed Yinette closed 5 days ago

Yinette commented 1 month ago

Description

While troubleshooting an issue that resulted in a dockerd crash, I have found that docker-compose does not close a unix socket with the docker daemon when a container exits and restarts.

This means that exponentially as containers restart more sockets are left open with dockerd, which spins up a new thread for each time a new one is opened, eventually leading dockerd to hit max-threads limit in the Kernel and crash.

In our case, the production workload has about 10 containers that are in an on-demand deployment style, so not all 10 need to be up and running depending on the pool of "devices" needing data processed upstream of them. Ones that aren't used will restart and re-query the database for a new endpoint to process input from.

While this approach of restarting containers is probably not the greatest, at the very least the unix sockets should be closed when finished.

From looking into this on the dockerd side, it appears that dockerd sends a notifyClosed via the socket when it exits, does docker compose handle that?

Steps To Reproduce

  1. create docker-compose.yaml

    ---
    services:
    occupied:
    image: container-used:latest
    logging:
       driver: "json-file"
       options:
         max-file: "5"
         max-size: "30m"
    
    surplus:
    restart: unless-stopped
    image: container-surplus:latest
    logging:
      driver: "json-file"
      options:
        max-file: "5"
        max-size: "30m"
  2. create Dockerfile
    
    FROM alpine:latest AS occupied

CMD ["sleep", "3600"]

FROM alpine:latest AS surplus

CMD ["sleep", "10"]


3. build images
```shell
docker build --target occupied -t container-used:latest .
docker build --target surplus -t container-surplus:latest .
  1. start project

    docker compose -f docker-compose.yaml --scale occupied=8 --scale surplus=2 up
  2. concurrently with 4, observe total number of tasks/threads under dockerd:

    watch -n1 "ls /proc/`pgrep dockerd`/task | wc -l"
  3. concurrently with 4, observe unix socket fds under the docker-compose pid:

    watch -n1 "lsof -p `pgrep docker-compose` | grep unix"

You will observe the following:

Compose Version

Docker Compose version v2.29.1

Docker Environment

Client: Docker Engine - Community
 Version:    27.1.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.16.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 11
  Running: 0
  Paused: 0
  Stopped: 11
 Images: 7
 Server Version: 27.1.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 2bf793ef6dc9a18e00cb12efb64355c2c9d5eb41
 runc version: v1.1.13-0-g58aa920
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
 Kernel Version: 5.4.0-190-generic
 Operating System: Ubuntu 20.04.6 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.74GiB
 Name: moby48236
 ID: f5179277-2595-4df6-9ec6-44fc55c19f74
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Anything else?

Initially thought this was a regression of an existing bug in dockerd that was fixed in 27.0.1, and opened an issue there: https://github.com/moby/moby/issues/48236

However with observation and feedback from the devs, I discovered this to be an issue with the compose plugin. So here I am :)

thaJeztah commented 1 month ago

cc @ndeloof @glours (relates to https://github.com/moby/moby/issues/48236 for more details)

ndeloof commented 1 month ago

Not sure I understand how compose is involve here. Compose creates containers with restartPolicy as declared in compose file, but this is engine restarting them. Compose is stateless and client side, so once container have been created it has no impact on what's going on there.

Yinette commented 1 month ago

From comments in the linked moby issue, it appears docker-compose only closes file descriptors upon its own exit, but until such a time anything open with the daemon remains open, especially if compose is being run in foreground and showing the logs of the running containers.

From what I can see on the daemon side and my garnered understanding, dockerd reaches a notifyClosed routine before continuing to lstat() the socket, so I believe the daemon is awaiting docker-compose to release that fd before it itself releases on its end. So both ends are just kinda stuck waiting on one another to do something if that makes sense.

ndeloof commented 1 month ago

docker-compose access the engine through the HTTP API. Maybe it does not manage correctly the many concurrent connexions, but this should not have impacts on file descriptors used by engine.

Yinette commented 1 month ago

Please correct me if I'm wrong, but don't those HTTP API connections pass through UNIX Sockets? Those are at least being held open after the engine stops the container they were responsible for.

Yinette commented 1 week ago

Still seems to be an issue... I've confirmed that using the --scale option in docker compose seems to trigger or at least accelerate the issue, even if the containers have init: true

we can even make it crash faster! --scale surplus=9001

ndeloof commented 1 week ago

If you run thousands replicas, compose will open same number of ContainerAttach long-running API calls, which may indeed break some limits. Still, I hardly understand how Compose would be responsible for this issue, as a docker API client (just using same SDK as docker CLI)

Yinette commented 1 week ago

Yeah, after playing around with this more in an isolated environment, I don't see how docker-compose could be involved, if as you say, it's just using the upstream docker client API. I'll take this on back to Moby.

Yinette commented 1 week ago

Though, I find logic for attaching containers https://github.com/docker/compose/blob/main/pkg/compose/attach.go here, but is there logic to detach/close that connection after the container is closed?

goroutine 218 [chan receive]:
runtime.gopark(0x6e3a22746e696f70?, 0x426e4f222c6c6c75?, 0x75?, 0x69?, 0x6562614c222c6c6c?)
    runtime/proc.go:398 +0xce fp=0xc000a18f00 sp=0xc000a18ee0 pc=0x44002e
runtime.chanrecv(0xc000540300, 0x0, 0x1)
    runtime/chan.go:583 +0x3cd fp=0xc000a18f78 sp=0xc000a18f00 pc=0x40baed
runtime.chanrecv1(0x6e2d72656e696174?, 0x223a227265626d75?)
    runtime/chan.go:442 +0x12 fp=0xc000a18fa0 sp=0xc000a18f78 pc=0x40b6f2
github.com/docker/compose/v2/pkg/compose.(*composeService).attachContainerStreams.func3()
    github.com/docker/compose/v2/pkg/compose/attach.go:127 +0x49 fp=0xc000a18fe0 sp=0xc000a18fa0 pc=0x1cb6969
runtime.goexit()
    runtime/asm_amd64.s:1650 +0x1 fp=0xc000a18fe8 sp=0xc000a18fe0 pc=0x4704e1
created by github.com/docker/compose/v2/pkg/compose.(*composeService).attachContainerStreams in goroutine 81
    github.com/docker/compose/v2/pkg/compose/attach.go:126 +0x205

I did a SIGKILL dump and saw many open sockets with dockerd still from compose using the reproduction steps above, and attach.go seems to be the most of them:

grep -c "github.com/docker/compose/v2/pkg/compose/attach.go:126" compose-trace-2.log 
92

Is there anything else I can run or experiment with to help give more information into this issue?

ndeloof commented 1 week ago

There's no Detach API. ContainerAttach is a long running HTTP-hijack call which updates protocol so stdin/stdout streams can be attached from client to container. As container stops, streams will end with EOF This is managed here: https://github.com/docker/compose/blob/e6ef8629a8e3d4dd7e0565c2237bf528149ee1e9/pkg/compose/attach.go#L146-L152 using code from docker/cli, which I would consider to be safe for the need :)

Yinette commented 6 days ago

Thanks for clarifying that! I'll look closer at the behaviour of the containers to see what they're doing when they exit. Furthermore, providing information into the Moby project to track this issue.