docker / compose

Define and run multi-container applications with Docker
https://docs.docker.com/compose/
Apache License 2.0
33.62k stars 5.19k forks source link

[BUG] Container not always added to network of dependencies #11665

Open Valentin-Metz opened 5 months ago

Valentin-Metz commented 5 months ago

Description

I am running Matrix in a docker compose setup with traefik. The relevant parts of the compose:

version: "3.9"
services:
  synapse:
    image: matrixdotorg/synapse
    restart: always
    hostname: "synapse"
    volumes:
      - /storage/Matrix/data:/data
    healthcheck:
      test: ["CMD", "curl", "-fSs", "http://localhost:8008/health"]
      interval: 15s
      timeout: 5s
      retries: 3
      start_period: 300s
    depends_on:
      - synapse_db
    networks:
      - default
      - traefik
    deploy:
      resources:
        limits:
          cpus: "2"
          memory: "4g"
    labels:
      - "traefik.enable=true"
      - "other_traefik_stuff"

  synapse_db:
    image: postgres:15
    restart: always
    hostname: "synapse-postgres"
    env_file: ./postgres_synapse_env.txt
    volumes:
      - /storage/Matrix/postgres:/var/lib/postgresql/data

networks:
  traefik:
    external: true

Synapse requires network access (through the default network) and is published through the traefik reverse proxy network. It also depends on the postgres database, so it should also connect to the matrix_compose_default (based on folder name) network.

This is how a dops -a looks like if it starts successfully:

11de03ef565e    matrix_compose-synapse-1                   matrixdotorg/synapse                      /start.py                    2024-03-27 05:03:10        [RUNNING]    Up 7 minutes (healthy)                                    matrix_compose_default         172.30.0.5
                                                                                                                                                                                                                                    traefik                        172.18.0.12
eafe8a85d68a    matrix_compose-synapse_db-1                postgres                       15         docker-entrypoint.sh         2024-03-27 05:03:10        [RUNNING]    Up 7 minutes                                              matrix_compose_default         172.30.0.2

matrix_compose-synapse-1 is in both networks.

I have the synapse service in a daily restart/upgrade cronjob: cd /root/matrix_compose/ && docker compose pull && docker compose down && sleep 5 && docker compose up --force-recreate --build --remove-orphans -d This sometimes fails to start correctly, due to docker not correctly connecting synapse to the db container before starting it. The error given by synapse: psycopg2.OperationalError: could not translate host name "synapse-postgres" to address: Name or service not known

Output of dops -a at the time:

6e89e118b432    matrix_compose-synapse-1                   matrixdotorg/synapse                      /start.py                    2024-03-27 04:30:19        [RESTARTING]    Restarting (1) 42 seconds ago                                traefik                                                             
970bd78034a9    matrix_compose-synapse_db-1                postgres                       15         docker-entrypoint.sh         2024-03-27 04:30:18        [RUNNING]       Up 29 minutes                                                matrix_compose_default                                              172.27.0.3

synapse is not in the matrix_compose_default default network this time, and instead restarting continuously as it fails to connect to the database. This is flaky, and if I rerun the cronjob command it fixes itself.

Steps To Reproduce

No response

Compose Version

Docker Compose version v2.24.6
fish: Unknown command: docker-compose

Docker Environment

Client: Docker Engine - Community
 Version:    25.0.3
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.12.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.24.6
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 31
  Running: 31
  Paused: 0
  Stopped: 0
 Images: 79
 Server Version: 25.0.3
 Storage Driver: overlay2
  Backing Filesystem: zfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.1.0-18-amd64
 Operating System: Debian GNU/Linux 12 (bookworm)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.613GiB
 Name: homelab
 ID: f2bdec7c-f4b2-49d8-b0f6-a124735dc0e4
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Anything else?

No response

ndeloof commented 5 months ago

Sounds like https://github.com/docker/compose/issues/11601, maybe related to a Docker Engine issue. Can you give latest release a try, 25.0.4 at least includes some relevant networking fixes, see Moby v25.0.4

Valentin-Metz commented 5 months ago

Sounds like #11601, maybe related to a Docker Engine issue. Can you give latest release a try, 25.0.4 at least includes some relevant networking fixes, see Moby v25.0.4

I'll try that and see if the issue persists.

Valentin-Metz commented 5 months ago

Networking within Docker seems a lot more stable since the update. I've had no issues since. If nothing comes up within the next month, I think it's save to assume this to be solved and close the issue.

github-actions[bot] commented 1 day ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.