docker / cli

The Docker CLI
Apache License 2.0
4.91k stars 1.93k forks source link

Accessing remote server via DOCKER_HOST eats all memory #3528

Open turkerdev opened 2 years ago

turkerdev commented 2 years ago

Accessing remote server via SSH and running command eats all the memory. Using the same command in server itself has no problem.

For instance,

I have a docker compose file in my local, if I run the command below, it eats all the memory and server shuts down.

DOCKER_HOST=ssh://blabla docker compose up

but, if I copy the same compose file to server and run the docker compose up command only uses ~50MB memory.

thaJeztah commented 2 years ago

Can you provide more details, otherwise this may be difficult to look into;

turkerdev commented 2 years ago

my local uses docker desktop, but the issue also exist when I run the same command with gitlab ci. also yes using com.docker.cli reproduces the issue.

here is a video of the issue

docker version from server

Client:
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.15.14
 Git commit:        f0df350
 Built:             Wed Nov 17 03:05:36 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.15.14
  Git commit:       b0f5bc3
  Built:            Wed Nov 17 03:06:14 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0
  GitCommit:        84113eef6fc27af1b01b3181f31bbaf708715301
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

DOCKER_HOST=... docker info

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.8.1)
  compose: Docker Compose (Docker Inc., v2.3.3)
  scan: Docker Scan (Docker Inc., v0.17.0)

Server:
 Containers: 28
  Running: 1
  Paused: 0
  Stopped: 27
 Images: 40
 Server Version: 20.10.7
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc version: 84113eef6fc27af1b01b3181f31bbaf708715301
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.10.102-99.473.amzn2.x86_64
 Operating System: Amazon Linux 2
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 965.5MiB
 Name: ip-172-31-39-226.eu-central-1.compute.internal
 ID: ROM7:G3CD:UZ5W:OC3Q:347K:BD5Y:RDOY:NU4R:JHIW:L5Q6:BBNW:7XLN
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
services:
  mongo:
    image: mongo
  postgres:
    image: postgres
  redis:
    image: redis
  nginx:
    image: nginx
  node:
    image: node
turkerdev commented 2 years ago

Also, I noticed that using docker stack deploy has no issues. it works as it is supposed to be.

afshin-deriv commented 2 years ago

Memory usage by dockerd occurs because running docker/docker-compose without -d (even with -d only for few seconds), server creates many sshd threads that consume big chunk of memory:

Client 
---
$ DOCKER_HOST=ssh://<User-name>@<Server-IP> docker compose up

Server
---
$ pstree -p $(pgrep -f '/usr/sbin/sshd -D')

sshd(246356)─┬─sshd(337077)───sshd(337114)───docker(337115)─┬─{docker}(337116)
             │                                              ├─{docker}(337117)
             │                                              ├─{docker}(337118)
             │                                              ├─{docker}(337119)
             │                                              ├─{docker}(337120)
             │                                              ├─{docker}(337121)
             │                                              ├─{docker}(337122)
             │                                              ├─{docker}(337123)
             │                                              ├─{docker}(337124)
             │                                              └─{docker}(337125)
             ├─sshd(337133)───sshd(337170)───docker(337171)─┬─{docker}(337172)
             │                                              ├─{docker}(337173)
             │                                              ├─{docker}(337174)
             │                                              ├─{docker}(337175)
             │                                              ├─{docker}(337176)
             │                                              ├─{docker}(337177)
             │                                              ├─{docker}(337178)
             │                                              ├─{docker}(337179)
             │                                              ├─{docker}(337180)
             │                                              └─{docker}(337181)
             ├─sshd(337182)───sshd(337219)───docker(337220)─┬─{docker}(337221)
.
.
.
thaJeztah commented 2 years ago

Hm.. right, yes, so it would be attaching to each container in the compose stack to stream the output; I can imaging that causing more overhead, especially with ssh here. Wondering if we can make it reuse connections or something along those lines.

/cc @AkihiroSuda @ndeloof perhaps you have ideas?

AkihiroSuda commented 2 years ago

Maybe we should re-revert this (with some fix)?

afshin-deriv commented 2 years ago

I will work on this

afshin-deriv commented 2 years ago

I don’t think this issue is related to cli neither solve by this https://github.com/docker/cli/pull/2303


  1. Killing extra ssh processes on Docker server don’t reduce memory usage:

Client

export DOCKER_HOST=ssh://<User-name>@<Server-IP>

cat > docker-compose.yaml <<EOF
 services:
   mongo:
     image: mongo
   postgres:
     image: postgres
   redis:
     image: redis
   nginx:
     image: nginx
   node:
     image: node
EOF

docker-compose up

Server

sudo pstree -p $(pgrep -f '/usr/sbin/sshd -D')
 sshd(5156)─┬─sshd(825648)───sshd(825707)───bash(825708)───sudo(941496)───sudo(941497)───pstree(941498)
           ├─sshd(936369)───sshd(936406)───docker(936407)─┬─{docker}(936408)
           │                                              ├─{docker}(936409)
           │                                              ├─{docker}(936410)
           │                                              ├─{docker}(936411)
           │                                              ├─{docker}(936412)
           │                                              ├─{docker}(936413)
           │                                              ├─{docker}(936414)
           │                                              ├─{docker}(936415)
           │                                              ├─{docker}(936416)
           │                                              └─{docker}(936417)
           ├─sshd(938070)───sshd(938147)───docker(938260)─┬─{docker}(938262)
           │                                              ├─{docker}(938263)
           │                                              ├─{docker}(938264)
           │                                              ├─{docker}(938265)

sudo kill -9 938070 938309 ... <last ssh processID> ## from second docker ssh connections
  1. Running same commands over ssh consume less memory footprint as Docker, below commands roughly consume same amount of Ram on Server:
$ for i in `seq 10`;
> do ssh -nttf  <user-name>@<docker-server-ip> "docker run -it busybox top" 2>&1 &
> done

$ for i in `seq 60`;
> do ssh -nttf  <user-name>@<docker-server-ip> "top" 2>&1 &
> done
nullableVoidPtr commented 2 years ago

I can speak to this; the way docker works over SSH remote appears to be:

In summary: client docker-cli <-stdio-> ssh <-tcp-> sshd <-stdio-> remote docker-cli <-unix/npipe-> dockerd

While I myself am not too familiar with compose's internals, I'd think that an docker compose up command with many images may create multiple SSH connections, which appear as forks of the remote sshd process.

I'm currently workshopping a somewhat better solution here at the moment. I haven't made a PR pending further testing, potential cross-platform issues, and error-handling, but also implementation on Docker CLI here. The high-level overview of my changes I plan to make (so far) is:

Hopefully with this architecture, there's less memory overhead as there would hypothetically be just the one process, dockerd, which handles concurrent connections from Docker CLI clients.

[1] I'm not too certain if this is actually needed, but it is a nice feature. I've already pushed code on my fork to take an accepted ssh.Conn, and pass it to a goroutine which continuously demultiplexes session channel requests into a net.Conn interface for the apiserver to Accept and run with.

TheSilkky commented 1 year ago

I'm using a remote SSH Docker context on MacOS running Docker Desktop to deploy stacks to my server, here's the output of docker info on my local system:

Client:
 Version:    24.0.2
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.5
    Path:     /Users/ellie/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.18.1
    Path:     /Users/ellie/.docker/cli-plugins/docker-compose
  deployx: Deploy a new stack or update an existing stack (aaraney)
    Version:  0.0.1
    Path:     /Users/ellie/.docker/cli-plugins/docker-deployx
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.0
    Path:     /Users/ellie/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.19
    Path:     /Users/ellie/.docker/cli-plugins/docker-extension
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v0.1.0-beta.4
    Path:     /Users/ellie/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/ellie/.docker/cli-plugins/docker-sbom
  scan: Docker Scan (Docker Inc.)
    Version:  v0.26.0
    Path:     /Users/ellie/.docker/cli-plugins/docker-scan
  scout: Command line tool for Docker Scout (Docker Inc.)
    Version:  v0.12.0
    Path:     /Users/ellie/.docker/cli-plugins/docker-scout

Server:
 Containers: 2
  Running: 0
  Paused: 0
  Stopped: 2
 Images: 27
 Server Version: 24.0.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.15.49-linuxkit-pr
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 4
 Total Memory: 7.668GiB
 Name: docker-desktop
 ID: 7c813daa-98e6-446a-9a03-0b4ec69bf2e1
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

I left my computer on overnight and when I checked my servers metrics I noticed sshd was using almost 6 GB of memory. There was hundreds of these ssh sessions and docker system dial-stdio processes running on my server:

root       11881  0.0  0.1  25484  9472 ?        Ss   04:27   0:00 sshd: ellie [priv]
ellie      11887  0.0  0.0  25624  6412 ?        S    04:27   0:00 sshd: ellie@notty
ellie      11889  0.0  0.2 1180192 22836 ?       Ssl  04:27   0:00 docker system dial-stdio

Does anyone have some insight on this? My system is just constantly creating these sessions for no reason, when I'm not even using the Docker context. There's also a fairly recent forum post about this: Docker Continuously Making Unnecessary SSH Connections to Remote Servers

EDIT: Exiting Docker Desktop closes all of the ssh sessions and exits all the dial-stdio processes on the remote server, however if you leave Docker running it just continuously creates those sessions, eventually leading to a situation where it will use all of the servers memory.