docker / compose

Define and run multi-container applications with Docker
https://docs.docker.com/compose/
Apache License 2.0
33.52k stars 5.17k forks source link

[BUG] Overlay network not found on worker node #11894

Open thormme opened 3 months ago

thormme commented 3 months ago

Description

Issue: Swarm worker hosts fail to attach to manager node overlay networks unless a container has been manually started and attached to the network using docker run --network swarm-overlay

Expected Behavior: This should automatically attach to the overlay network and it should be visible in the docker network info.

$> docker network ls
8e3c351af333   bridge             bridge    local
0cbc0420c111   docker_gwbridge    bridge    local
x8gb7mz6s222   swarm-overlay      overlay   swarm
c09ad17a7321   host               host      local
keth4xuub123   ingress            overlay   swarm
d8baa27f3654   none               null      local

Workaround: The only solution I have found is to downgrade to an earlier version (2.21.0-1) of docker-compose-plugin

sudo apt list -a docker-compose-plugin
sudo apt install docker-compose-plugin=2.21.0-1~debian.11~bullseye

I believe this is the same issue as https://github.com/docker/compose/issues/11387 but i couldn't find any open bugs with the same issue.

Thanks for any help with this!

Steps To Reproduce

I created a custom overlay network on the swarm manager node.

...
  service:
    image: service-image
    container_name: service
    networks:
      - swarm-overlay
    restart: unless-stopped
...
networks:
  swarm-overlay:
    attachable: true
    driver: overlay

This correctly created the network and attached the relevant container to it.

I then joined a worker host to the swarm and attempted to connect a container to the overlay network.

...
worker-service:
    image: worker-image
    container_name: worker-service
    networks:
      swarm-overlay:
        aliases:
          - host1-worker-service
    restart: unless-stopped
...
networks:
  swarm-overlay:
    external: true
    driver: overlay

docker compose up -d worker-service This errors with:

Error response from daemon: network swarm-overlay not found

Compose Version

docker-compose-plugin/bullseye 2.27.1-1~debian.11~bullseye
Docker Compose version v2.27.1

Docker Environment

Client: Docker Engine - Community
 Version:    26.1.4
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.14.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.27.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 12
  Running: 5
  Paused: 0
  Stopped: 7
 Images: 31
 Server Version: 26.1.4
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: active
  NodeID: 2brhg9vzj8m47oyo40ie5yj0u
  Is Manager: false
  Node Address: 1.2.3.4
  Manager Addresses:
   4.3.2.1:2377
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: d2d58213f83a351ca8f528a95fbd145f5654e957
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.10.0-28-cloud-amd64
 Operating System: Debian GNU/Linux 11 (bullseye)
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 13.42GiB
 Name: cloud-machine
 ID: 6c0ae974-1ba3-450a-ab03-d31b31c6097f
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Anything else?

No response

ndeloof commented 2 months ago

This isn't the same issue as #11387 as here this is the docker engine reporting error: Error response from daemon: network swarm-overlay not found

Can you please confirm you can use docker run --network swarm-overlay ... to run equivalent container on worked node with this swarm setup ?

jsunstrom commented 2 months ago

I'm running into this exact same issue using Docker Compose 2.27.0. I can confirm that I can use docker run -it --name alpine1 --network test-net alpine from the official documentation. I walked through the entirety of the "Use an overlay network for standalone containers" and it worked as expected.

However, using docker compose files, I also get the error Error response from daemon: network <my network name here> not found message using docker compose up -d.

ambretanmay commented 2 months ago

I am having the exact same issue. Docker Compose version v2.27.1 @ndeloof docker run --network swarm-overlay works and compose doesn't

inql commented 2 months ago

btw is the downgrade workaround needed for both leader and worker node?

ambretanmay commented 2 months ago

@inql I have not tested this as our scripts set versions for all nodes.

michaelmcandrew commented 2 months ago

Hey there, also affected by this bug.

If you don't want to downgrade another workaround is to create a container and attach it to the network. It then appears in the list and docker compose no longer complains

docker run -dit --name keep-alive --network --restart=always <network_name> alpine

Adding --restart=always will ensure that it survives restarts of the docker daemon, etc.

My versions in case it is useful:

docker version

Client: Docker Engine - Community Version: 27.0.3 API version: 1.46 Go version: go1.21.11 Git commit: 7d4bcd8 Built: Sat Jun 29 00:02:50 2024 OS/Arch: linux/amd64 Context: default

Server: Docker Engine - Community Engine: Version: 27.0.3 API version: 1.46 (minimum version 1.24) Go version: go1.21.11 Git commit: 662f78c Built: Sat Jun 29 00:02:50 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.7.18 GitCommit: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e runc: Version: 1.7.18 GitCommit: v1.1.13-0-g58aa920 docker-init: Version: 0.19.0 GitCommit: de40ad0

docker compose version Docker Compose version v2.28.1

kulpsin commented 2 months ago

As in above, sorry did not realise that @michaelmcandrew also mentioned this but at least this comment confirms his findings: https://github.com/docker/compose/issues/11894#issuecomment-2206522846

I tested this issue and noticed that if there exists running container which has connection to the external overlay network (started with docker run ... and visible in docker network ls), then the compose is able to connect to the external overlay network.

So, without knowing anything about internals, the problem might have something to do with not checking for available external overlay networks but instead checking just internal networks (visible with docker network ls).

So as an additinal workaround it is possible to first start "dummy" container on workers via for example:

$ docker compose up -d
Error response from daemon: network <overlay-network> not found
$ run -dit --rm --name dummy-network-container --network <overlay-network> alpine
43924b1b25ac73373aac9120b55ac46fc1de3435ce26485682e11d6c06671936
$ docker compose up -d
[+] Running 1/0
 ✔ Container worker-service  Started
$ _

I also checked downgrading and for Ubuntu 22.04 it worked, so I think I will be using downgraded version for now myself. sudo apt-get remove docker-compose-plugin && sudo apt-get install docker-compose-plugin=2.21.0-1~ubuntu.22.04~jammy

$ docker version
Client: Docker Engine - Community
 Version:           27.0.3
 API version:       1.46
 Go version:        go1.21.11
 Git commit:        7d4bcd8
 Built:             Sat Jun 29 00:02:33 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          27.0.3
  API version:      1.46 (minimum version 1.24)
  Go version:       go1.21.11
  Git commit:       662f78c
  Built:            Sat Jun 29 00:02:33 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.18
  GitCommit:        ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
 runc:
  Version:          1.7.18
  GitCommit:        v1.1.13-0-g58aa920
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

$ docker compose version
Docker Compose version v2.28.1
ndeloof commented 2 months ago

@kulpsin docker network ls indeed does not detect overlay networks created on another swarm node (not sure about the reason, but that's what we get with the engine API) until it is used by some container. So Docker Compose can't check network existence, but should detect swarm is enabled and ignore error (assuming container create will fail if there's an actual missing network). See https://github.com/docker/compose/blob/11d5ecdc75ab96214f35db4cdc0361ee080d1c07/pkg/compose/create.go#L1334-L1340

Not sure why this doesn't work as expected, need to setup a test environment and try to reproduce this bug

jhrotko commented 1 month ago

With the original compose.yml it would generate swarm-netword-overlay_swarm-overlay network

Screenshot 2024-07-18 at 15 57 57

...and then the worker would not be able to find the external network as expected

By adding the name: swarm-overlay on the network it made it work for me for version v2.28.1 docker compose up -d

...
  service:
    image: service-image
    container_name: service
    networks:
      - swarm-overlay
    restart: unless-stopped
...
networks:
  swarm-overlay:
    name: swarm-overlay <---- 
    attachable: true
    driver: overlay

after this it generates the following result for docker network ls

Screenshot 2024-07-18 at 16 00 19

and now the worker is referencing the right network

Screenshot 2024-07-18 at 16 07 00
michaelmcandrew commented 1 month ago

To flesh out my steps to reproduce a bit more, since they are slightly different from the ones mentioned above, I created a swarm network on the lead node with docker network create --driver overlay test --attachable.

This network was not visible on the worker node (expected I think because nothing was connected).

However, I was not able to connect to it with the below networks section in a compose.yaml on the worker node.

networks:
  test:
    external: true

I created the following container on the worker node docker run -dit --name keep-alive --network test --restart=always alpine

I was then able to connect using the above networks section in a compose.yaml on the worker node.

Hope that help with the reproduction!