Open thormme opened 5 months ago
This isn't the same issue as #11387 as here this is the docker engine reporting error: Error response from daemon: network swarm-overlay not found
Can you please confirm you can use docker run --network swarm-overlay ...
to run equivalent container on worked node with this swarm setup ?
I'm running into this exact same issue using Docker Compose 2.27.0. I can confirm that I can use docker run -it --name alpine1 --network test-net alpine
from the official documentation. I walked through the entirety of the "Use an overlay network for standalone containers" and it worked as expected.
However, using docker compose files, I also get the error Error response from daemon: network <my network name here> not found
message using docker compose up -d
.
I am having the exact same issue. Docker Compose version v2.27.1 @ndeloof docker run --network swarm-overlay works and compose doesn't
btw is the downgrade workaround needed for both leader and worker node?
@inql I have not tested this as our scripts set versions for all nodes.
Hey there, also affected by this bug.
If you don't want to downgrade another workaround is to create a container and attach it to the network. It then appears in the list and docker compose no longer complains
docker run -dit --name keep-alive --network --restart=always <network_name> alpine
Adding --restart=always
will ensure that it survives restarts of the docker daemon, etc.
My versions in case it is useful:
docker version
Client: Docker Engine - Community Version: 27.0.3 API version: 1.46 Go version: go1.21.11 Git commit: 7d4bcd8 Built: Sat Jun 29 00:02:50 2024 OS/Arch: linux/amd64 Context: default
Server: Docker Engine - Community Engine: Version: 27.0.3 API version: 1.46 (minimum version 1.24) Go version: go1.21.11 Git commit: 662f78c Built: Sat Jun 29 00:02:50 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.7.18 GitCommit: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e runc: Version: 1.7.18 GitCommit: v1.1.13-0-g58aa920 docker-init: Version: 0.19.0 GitCommit: de40ad0
docker compose version
Docker Compose version v2.28.1
As in above, sorry did not realise that @michaelmcandrew also mentioned this but at least this comment confirms his findings: https://github.com/docker/compose/issues/11894#issuecomment-2206522846
I tested this issue and noticed that if there exists running container which has connection to the external overlay network (started with docker run ...
and visible in docker network ls
), then the compose is able to connect to the external overlay network.
So, without knowing anything about internals, the problem might have something to do with not checking for available external overlay networks but instead checking just internal networks (visible with docker network ls).
So as an additinal workaround it is possible to first start "dummy" container on workers via for example:
$ docker compose up -d
Error response from daemon: network <overlay-network> not found
$ run -dit --rm --name dummy-network-container --network <overlay-network> alpine
43924b1b25ac73373aac9120b55ac46fc1de3435ce26485682e11d6c06671936
$ docker compose up -d
[+] Running 1/0
✔ Container worker-service Started
$ _
I also checked downgrading and for Ubuntu 22.04 it worked, so I think I will be using downgraded version for now myself.
sudo apt-get remove docker-compose-plugin && sudo apt-get install docker-compose-plugin=2.21.0-1~ubuntu.22.04~jammy
$ docker version
Client: Docker Engine - Community
Version: 27.0.3
API version: 1.46
Go version: go1.21.11
Git commit: 7d4bcd8
Built: Sat Jun 29 00:02:33 2024
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 27.0.3
API version: 1.46 (minimum version 1.24)
Go version: go1.21.11
Git commit: 662f78c
Built: Sat Jun 29 00:02:33 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.7.18
GitCommit: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
runc:
Version: 1.7.18
GitCommit: v1.1.13-0-g58aa920
docker-init:
Version: 0.19.0
GitCommit: de40ad0
$ docker compose version
Docker Compose version v2.28.1
@kulpsin docker network ls
indeed does not detect overlay networks created on another swarm node (not sure about the reason, but that's what we get with the engine API) until it is used by some container. So Docker Compose can't check network existence, but should detect swarm is enabled and ignore error (assuming container create will fail if there's an actual missing network). See https://github.com/docker/compose/blob/11d5ecdc75ab96214f35db4cdc0361ee080d1c07/pkg/compose/create.go#L1334-L1340
Not sure why this doesn't work as expected, need to setup a test environment and try to reproduce this bug
With the original compose.yml it would generate swarm-netword-overlay_swarm-overlay
network
...and then the worker
would not be able to find the external network as expected
By adding the name: swarm-overlay
on the network it made it work for me for version v2.28.1
docker compose up -d
...
service:
image: service-image
container_name: service
networks:
- swarm-overlay
restart: unless-stopped
...
networks:
swarm-overlay:
name: swarm-overlay <----
attachable: true
driver: overlay
after this it generates the following result for docker network ls
and now the worker is referencing the right network
To flesh out my steps to reproduce a bit more, since they are slightly different from the ones mentioned above, I created a swarm network on the lead node with docker network create --driver overlay test --attachable
.
This network was not visible on the worker node (expected I think because nothing was connected).
However, I was not able to connect to it with the below networks section in a compose.yaml
on the worker node.
networks:
test:
external: true
I created the following container on the worker node docker run -dit --name keep-alive --network test --restart=always alpine
I was then able to connect using the above networks section in a compose.yaml
on the worker node.
Hope that help with the reproduction!
I created the following container on the worker node
docker run -dit --name keep-alive --network test --restart=always alpine
Thanks this worked for me.
Is this a bug in compose? I would expect somewhat feature parity between docker and docker compose.
@tuxthepenguin84 docker compose does some client-side validation before running containers, and as such looks for target network to exist. docker run
will just fail if not found, without preliminary validation.
Can you please confirm issue persists with latest version ? AFAIK we had a fix for it
It appears to me the issue still persists, at least for me and my use case.
Docker Compose version v2.29.7
Client: Docker Engine - Community
Version: 27.3.1
API version: 1.47
Go version: go1.22.7
Git commit: ce12230
Built: Fri Sep 20 11:41:00 2024
OS/Arch: linux/amd64
Context: default
[+] Running 3/3
✔ Container proxy2-nginx-exporter Removed 0.5s
✔ Container proxy2 Removed 1.8s
✔ Network proxy_default Removed 0.4s
[+] Running 2/3
✔ Network proxy_default Created 0.8s
⠸ Container proxy2 Starting 2.3s
✔ Container proxy2-nginx-exporter Started 2.0s
Error response from daemon: could not find a network matching network mode jf5y7525s7qqt0333lfolwruk: network jf5y7525s7qqt0333lfolwruk not found
[
{
"Name": "ai",
"Id": "jf5y7525s7qqt0333lfolwruk",
"Created": "2024-10-06T20:26:15.848600039Z",
"Scope": "swarm",
"Driver": "overlay",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "10.0.3.0/24",
"Gateway": "10.0.3.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": null,
"Options": {
"com.docker.network.driver.overlay.vxlanid_list": "4099"
},
"Labels": null
}
]
The network is there.
services:
proxy2:
image: nginx:latest
container_name: proxy2
restart: unless-stopped
networks: ['ai', 'collaboration', 'core', 'garage', 'health', 'iot', 'olivetin', 'media', 'metrics', 'proxy', 'security', 'sprinklers']
ports:
- 443:443
volumes:
- /containers/proxy/nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- /containers/proxy/nginx/conf.d:/etc/nginx/conf.d:ro
- /containers/proxy/dhparams.pem:/etc/ssl/dhparams.pem:ro
- /certs/delchampsio/fullchain.pem:/etc/ssl/delchampsio/fullchain.pem:ro
- /certs/delchampsio/privkey.pem:/etc/ssl/delchampsio/privkey.pem:ro
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
proxy2-nginx-exporter:
image: nginx/nginx-prometheus-exporter:latest
container_name: proxy2-nginx-exporter
restart: unless-stopped
ports:
- 9113:9113
command:
- --nginx.scrape-uri=http://proxy2:8080/nginx_status
networks:
ai:
name: ai
driver: overlay
external: true
collaboration:
name: collaboration
driver: overlay
external: true
core:
name: core
driver: overlay
external: true
garage:
name: garage
driver: overlay
external: true
health:
name: health
driver: overlay
external: true
iot:
name: iot
driver: overlay
external: true
olivetin:
name: olivetin
driver: overlay
external: true
media:
name: media
driver: overlay
external: true
metrics:
name: metrics
driver: overlay
external: true
proxy:
name: proxy
driver: overlay
external: true
security:
name: security
driver: overlay
external: true
sprinklers:
name: sprinklers
driver: overlay
external: true
If I run the following and get a container up and running on that "missing" network, I can get the container started with compose
docker run -dit --rm --name dummy-network-container --network ai alpine
Let me know if you need more info or want me to try something, I'm happy to help out and work on getting this fixed.
@tuxthepenguin84 could you please give binary from https://github.com/docker/compose/pull/12233 a try (binaries available on https://github.com/docker/compose/actions/runs/11513518822, at bottom) ?
This adds some debugs to the network resolution logic that will help diagnose this issue
run as docker compose --verbose --progress=plain up
Thanks I'll try that out and report back.
@ndeloof I have the issue with the compose plugin version v2.27.0 running on Ubuntu Server 24.04 with ARM Arch
Here is the output of testing the binary from #12233
/etc/salt/docker/test # /etc/salt/docker/docker-compose-linux-aarch64 --verbose --progress=plain up -d
DEBU[0000] search network "axel5" by name returned: 0
DEBU[0000] search network "axel5" by ID succeeded
DEBU[0000] networks matching name "axel5" after strict filtering: 0
DEBU[0000] no match, swarm is enabled: true
Container test-dummy-1 Recreate
DEBU[0005] otel error error="<nil>"
Container test-dummy-1 Recreated
Container test-dummy-1 Starting
Container test-dummy-1 Started
DEBU[0010] otel error error="<nil>"
DEBU[0010] otel error error="<nil>"
This version properly creates the network
Here is my docker info
output
/etc/salt/docker/test # docker info
Client:
Version: 26.1.5
Context: default
Debug Mode: false
Plugins:
compose: Docker Compose (Docker Inc.)
Version: v2.27.0
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 11
Running: 6
Paused: 0
Stopped: 5
Images: 13
Server Version: 27.3.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: active
NodeID: mi4aclsip2vfc0fmdk0lizvoi
Is Manager: false
Node Address: 172.31.41.5
Manager Addresses:
172.31.45.225:2377
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 57f17b0a6295a39009d861b89e3b3b87b005ca27
runc version: v1.1.14-0-g2c9f560
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.8.0-1016-aws
Operating System: Ubuntu 24.04.1 LTS
OSType: linux
Architecture: aarch64
CPUs: 4
Total Memory: 7.582GiB
Name: ip-172-31-41-5
ID: aebad7d3-d242-435a-a215-9e10a8a1a6b1
Docker Root Dir: /var/lib/docker
Debug Mode: false
Labels:
salt-minion=dd6de55b-6f41-4cfd-924f-1231ed03995b
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
will try with the latest and report
My issue was that I have 2 versions of docker compose:
I fix it by installing the latest from edge like this:
apk add docker-cli docker-cli-compose --repository=https://dl-cdn.alpinelinux.org/alpine/edge/community
Description
Issue: Swarm worker hosts fail to attach to manager node overlay networks unless a container has been manually started and attached to the network using
docker run --network swarm-overlay
Expected Behavior: This should automatically attach to the overlay network and it should be visible in the docker network info.
Workaround: The only solution I have found is to downgrade to an earlier version (
2.21.0-1
) ofdocker-compose-plugin
I believe this is the same issue as https://github.com/docker/compose/issues/11387 but i couldn't find any open bugs with the same issue.
Thanks for any help with this!
Steps To Reproduce
I created a custom overlay network on the swarm manager node.
This correctly created the network and attached the relevant container to it.
I then joined a worker host to the swarm and attempted to connect a container to the overlay network.
docker compose up -d worker-service
This errors with:Compose Version
Docker Environment
Anything else?
No response