taimurshaikh commented 1 year ago

Description

Current Behavior

When running docker compose up to set up a dev environment, my custom containers depend on a postgres container. On most attempts of running docker compose up, the custom containers fail to make a connection with postgres, however there are occasions when it works. This issue is occuring for two out of three members of our team with Apple Silicon machines (M1 and M2), but it does not occur for the member of our team with an Intel Mac. We have already tried switching to the arm64v8/postgres version of the postgres image but that has not fixed the issue.

version: '3'
services:
  postgres:
    image: postgres
    environment:
      - POSTGRES_USER=REDACTED
      - POSTGRES_PASSWORD=REDACTED
      - POSTGRES_DB=REDACTED
    # volumes:
    #   - ./learn_prisma_default:/var/lib/docker/volumes/learn_prisma_postgres/_data
  redis:
    image: redis
    env_file:
      - .env

# My two custom containers
  venture-copilot:
    image: venture-copilot
    ports:
      - "8080:8080"
    env_file:
      - .env
    build:
      context: .
      dockerfile: Dockerfile
    depends_on:
      - postgres
      - redis

  dashboard:
    image: dashboard
    ports:
      - "80:80"
    env_file:
      - .env
    build:
      context: dashboard
      dockerfile: Dockerfile
    depends_on:
      - postgres
      - redis

Here are the logs from a successful and failed example: https://gist.github.com/taimurshaikh/c497167b00c4bc06c05fded18ca9cd0e

Steps To Reproduce

No response

Compose Version

Docker Compose version v2.12.2

Docker Environment

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.9.1)
  compose: Docker Compose (Docker Inc., v2.12.2)
  dev: Docker Dev Environments (Docker Inc., v0.0.3)
  extension: Manages Docker extensions (Docker Inc., v0.2.13)
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.0)
  scan: Docker Scan (Docker Inc., v0.21.0)
WARNING: Plugin "/Users/taimurshaikh/.docker/cli-plugins/docker-init" is not valid: failed to fetch metadata: fork/exec /Users/taimurshaikh/.docker/cli-plugins/docker-init: no such file or directory
WARNING: Plugin "/Users/taimurshaikh/.docker/cli-plugins/docker-scout" is not valid: failed to fetch metadata: fork/exec /Users/taimurshaikh/.docker/cli-plugins/docker-scout: no such file or directory

Server:
 Containers: 4
  Running: 0
  Paused: 0
  Stopped: 4
 Images: 4
 Server Version: 20.10.21
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 1c90a442489720eec95342e1789ee8a5e1b9536f
 runc version: v1.1.4-0-g5fd4c4d
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.15.49-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 4
 Total Memory: 7.667GiB
 Name: docker-desktop
 ID: 45IS:TCZX:YB3F:E5KM:ZGAU:SVYO:CEAW:YNSQ:IIPF:QYK4:5P2V:QIIT
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5000
  127.0.0.0/8
 Live Restore Enabled: false

Anything else?

List of things we have tried so far:

Adding a healthcheck to the postgres container, experimenting with different timeout lengths and number of retries
Upgrading Docker Desktop to latest version

glours commented 1 year ago

Hello @taimurshaikh

Can you try to add an healthcheck attribute to your postgres service, and a condition attribute to your depends_on like this:

services:
  postgres:
    image: postgres
    environment:
      - POSTGRES_USER=REDACTED
      - POSTGRES_PASSWORD=REDACTED
      - POSTGRES_DB=REDACTED
    healthcheck:
      test: [ "CMD", "pg_isready" ]
      interval: 10s
      timeout: 5s
      retries: 5
    # volumes:
    #   - ./learn_prisma_default:/var/lib/docker/volumes/learn_prisma_postgres/_data
  redis:
    image: redis
    env_file:
      - .env

# My two custom containers
  venture-copilot:
    image: venture-copilot
    ports:
      - "8080:8080"
    env_file:
      - .env
    build:
      context: .
      dockerfile: Dockerfile
    depends_on:
      postgres:
        condition: service_healthy
      redis:
         condition: service_started

  dashboard:
    image: dashboard
    ports:
      - "80:80"
    env_file:
      - .env
    build:
      context: dashboard
      dockerfile: Dockerfile
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_started

taimurshaikh commented 1 year ago

Hi @glours thanks for your quick response. Upon implementing your suggestions, we still face the same issue, and also this new error message appears in the logs continually:

venture-copilot-app-postgres-1         | 2023-06-08 15:21:02.790 UTC [95] FATAL:  role "root" does not exist

glours commented 1 year ago

Ho my bad, I didn't notice your custom user change the check line with test: [ "CMD", "pg_isready -d $${POSTGRES_DB} -U $${POSTGRES_USER}" ]

taimurshaikh commented 1 year ago

It appears the healtcheck fails:

container for service "postgres" is unhealthy

I tried experimenting with the number of retries and the timeout but it seems something is wrong with how the app is responding with postgres

milas commented 1 year ago

It looks like you're running Docker Desktop 4.14.0, which is from November 2022.

Can you upgrade to the latest version (Docker Desktop > Settings > Software Updates > Check for updates) and see if that helps?

taimurshaikh commented 1 year ago

The issue persisted when I had the most recent version of Docker Desktop installed - our team tried a universal downgrade to the version that was used when our codebase was originally set up but unfortunately that was no good either. I will upgrade again though on my machine and let you know if there are any changes.

UPDATE: After updating to the latest version of Docker, we unfortunately still get the message that postgres fails the healthcheck

dependency failed to start: container venture-copilot-app-postgres-1 is unhealthy

glours commented 1 year ago

@taimurshaikh can you share the log of your postgres service? It seems you have either a postgres configuration issue or, as you didn't tag your version of postgres, you're now using a version of postgres incompatible with your previous config. I did a docker image inspect postgres to check what is the current version of the latest flag and this is a PG_VERSION=15.3-1.pgdg110+1. Is it the version you expect to use?

taimurshaikh commented 1 year ago

Yes this is the one that works for out other team member with an Intel Mac which leads me to believe its a processor-related issue. I have seen examples of docker related issues when it comes to Apple Silicon e.g. on this page for Argilla https://docs.argilla.io/en/latest/getting_started/quickstart_installation.html

glours commented 1 year ago

Can you try to use this option in Docker Desktop (Settings > Features in development) to see if this could fix your issue

glours commented 1 year ago

And regarding the platform issue mentioned in the argilla doc, Compose will pull by default the image for your specific platform, so if a linux/arm64 or darwin/arm64 version is available it will be used

taimurshaikh commented 1 year ago

The option to use Rosetta is not showing up in my settings page even after enabling the virtualization framework and restarting the app. I will restart my machine and see if that does anything

tushar5526 commented 1 year ago

Yes, I can confirm this is happening with me as well. Randomly my connections to postgres containers are failing. I tried running the same setup on github codespaces and it worked fine on multiple tries. I am on Macbook Pro Apple Silicon M2 chip.

kelvin-lima commented 1 year ago

I'm facing a similar issue with random containers. Right now, I'm using Redis, MongoDB, RabbitMQ, MariaDB, and Postgres.

It randomly fails. According to my inspection, the main container is not being attached to the network of the Database and attaching correctly at others.

At my company we have this issue happening even with Linux Systems, so it should not be ann issue that happens only on Macs.

services:
  service-one:
    build:
      context: service-one
      dockerfile: Dockerfile
    container_name: service-one
    image: "kelvin/service_one"
    ports:
      - "8080:8080"
      - "5002:5005"
    environment:
      - GITHUB_ACTOR=$GITHUB_ACTOR
      - GITHUB_PASS=$GITHUB_PACKAGES_IMPORT_TOKEN
      - JPDA_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005
      - >-
        JAVA_TOOL_OPTIONS=-javaagent:/usr/lib/elastic-apm-agent.jar
    env_file: env-files/service_one.env.docker
    depends_on:
      redis:
        condition: service_started
      mariadb:
        condition: service_started
      mongodb:
        condition: service_started
      rabbitmq:
        condition: service_healthy
    volumes:
      - "./docker_data/pics:/pics"
    networks:
      - service_one_api
      - mariadb_api
      - mongodb_api
      - redis_api
      - rabbit_api
      - dynamodb_api
      - localstack_api
    entrypoint: [ "catalina.sh", "jpda", "run" ]
  service-two:
    build:
      context: service-two
      dockerfile: Dockerfile
      args:
        - GITHUB_ACTOR=$GITHUB_ACTOR
        - GITHUB_PACKAGES_IMPORT_TOKEN=$GITHUB_PACKAGES_IMPORT_TOKEN
    env_file:
      - env-files/service_two.env.docker
    container_name: service_two
    image: "kelvin/service_two"
    ports:
      - "7777:8080"
      - "5007:5005"
    depends_on:
      redis:
        condition: service_started
      postgres:
        condition: service_healthy
      rabbitmq: 
        condition: service_healthy
    environment:
      - >-
        JAVA_TOOL_OPTIONS=-javaagent:/usr/lib/reactor-tools.jar -javaagent:/usr/lib/elastic-apm-agent.jar
        -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=0.0.0.0:5005
    networks:
      - postgres_api
      - rabbit_api
      - redis_api
      - service_two_api
  localstack:
    container_name: localstack
    image: localstack/localstack
    ports:
      - '4566:4566'            # LocalStack Gateway
    networks:
      - localstack_api
    environment:
      - SERVICES=sqs
      - DEBUG=0
      - DOCKER_HOST=unix:///var/run/docker.sock
      - AWS_DEFAULT_REGION=sa-east-1
      - AWS_ID=test
      - AWS_KEY=test
    volumes:
      - ./localstack_setup/ready.sh:/etc/localstack/init/ready.d/init-aws.sh
  redis:
    container_name: redis
    image: redis:7.0.3-alpine
    restart: always
    ports:
      - '6379:6379'
    command: redis-server --save 60 1 --loglevel warning
    networks:
      - redis_api
    environment:
      - REDIS_PASSWORD=defaultpass
  mariadb:
    container_name: mariadb
    image: "mariadb:10.11.4"
    ports:
      - "3306:3306"
    environment:
      MYSQL_ROOT_PASSWORD: "kelvin"
      MYSQL_DATABASE: "kelvin"
      MYSQL_USER: "kelvin"
      MYSQL_PASSWORD: "kelvin"
    networks:
      - mariadb_api
    volumes:
      - "./database-scripts/monolith_schema.sql:/docker-entrypoint-initdb.d/1.sql"
      - "./database-scripts/monolith_seed.sql:/docker-entrypoint-initdb.d/2.sql"
  rabbitmq:
    container_name: rabbitmq
    image: rabbitmq:management-alpine
    ports:
      - "5672:5672"
      - "61613:61613"
      - "15672:15672"
    environment:
      RABBITMQ_DEFAULT_USER: user
      RABBITMQ_DEFAULT_PASS: password
    networks:
      - rabbit_api
    command: "/bin/bash -c \"rabbitmq-plugins enable --offline rabbitmq_stomp; rabbitmq-server\""
    healthcheck:
      test: rabbitmq-diagnostics check_port_connectivity
      interval: 1s
      timeout: 3s
      retries: 30
  postgres:
    image: "postgres:15.3"
    container_name: postgres
    ports:
      - "5432:5432"
    environment:
      POSTGRES_USER: "kelvin"
      POSTGRES_PASSWORD: "kelvin"
      POSTGRES_DB: "kelvin"
    volumes:
      - "./database-scripts/database_creation.sql:/docker-entrypoint-initdb.d/0.sql"
      - "./database-scripts/ttlock_schema.sql:/docker-entrypoint-initdb.d/1.sql"
      - "./database-scripts/ttlock_seed.sql:/docker-entrypoint-initdb.d/2.sql"
    networks:
      - postgres_api
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U loopkey"]
      interval: 5s
      timeout: 5s
      retries: 5
  mongodb:
    container_name: mongodb
    image: "mongo:6.0.6"
    ports:
      - "27017:27017"
    environment:
      MONGO_INITDB_ROOT_USERNAME: kelvin
      MONGO_INITDB_ROOT_PASSWORD: kelvin
    networks:
      - mongodb_api
networks:
  service_one_api:
    name:  service_one_api
  mariadb_api:
    name: mariadb_api
  mongodb_api:
    name: mongodb_api
  redis_api:
    name: redis_api
  rabbit_api:
    name: rabbit_api
  postgres_api:
    name: postgres_api
  service_two_api:
    name:  service_two_api

As I've mentioned the service-one or service-two randomly fails to connect to Redis, MariaDB, Postgres, or RabbitMQ. By inspection, the problem is occurring because sometimes the services are not connected to the right network.

In my last error case, the service-one cannot connect to redis_api and service_two connect. Additionally, both of them are connected to rabbit_api.

kelvin-lima commented 1 year ago

Additionally to the last message... After deleting the problematic container, in this case Redis, and running compose again I got: Error response from daemon: network d4cfffd6bcba7c9a7eb1060e31c90b754207385a526c62cba28cb4b071574595 not found

After that, running docker network prune, which deleted the redis_api and running the compose again, the container spin up and both services are connected now.

But If I remove all containers and spin up again, the problem happens.

This started happening after updating the Docker to:

Screenshot 2023-07-20 at 13 36 18

RockingThor commented 1 year ago

I'm getting the same error as previously mentioned. Docker is not creating the postgres container. It is just creating the app build. As my app depends on the postgres the build is failing every time.

Here is my docker-compose file: Screenshot 2023-11-09 at 9 29 18 PM

Here is the error I'm getting: Screenshot 2023-11-09 at 9 30 21 PM

glours commented 12 months ago

Hey everyone 👋 @kelvin-lima @taimurshaikh @tushar5526 Can you test again with a recent version of Compose? We fixed a couple a network problems since this issue was opened, especially this [one] (https://github.com/docker/compose/pull/10778) in Compose v2.20.0

@RockingThor you should add a healthcheck configuration to your postgres service, see here for an example, and the following depends_on config to your app service

depends_on:
      postgres:
        condition: service_healthy

This way you'll be sure your app service to wait until your pg service is ready to accept connection

RockingThor commented 12 months ago

Hey everyone 👋 @kelvin-lima @taimurshaikh @tushar5526 Can you test again with a recent version of Compose? We fixed a couple a network problems since this issue was opened, especially this [one] (#10778) in Compose v2.20.0

@RockingThor you should add a healthcheck configuration to your postgres service, see here for an example, and the following depends_on config to your app service
depends_on:
      postgres:
        condition: service_healthy
This way you'll be sure your app service to wait until your pg service is ready to accept connection

Thanks for your attention to the issue. Tried exactly what you instructed still getting the same error as my app build is unable to connect to the postgres container. Here is my updated docker-compose file:

Screenshot 2023-11-23 at 2 10 53 PM

This is my database URL: POSTGRES_PRISMA_URL=postgresql://rohitnandi:postgres@postgres:5432

Here is the error message: `` => [app 7/8] RUN npx prisma generate 2.0s => ERROR [app 8/8] RUN npx prisma db push 0.8s

[app 8/8] RUN npx prisma db push: 0.701 Environment variables loaded from .env 0.702 Prisma schema loaded from prisma/schema.prisma 0.705 Datasource "db": PostgreSQL database "postgres", schema "public" at "postgres:5432" 0.776 0.776 Error: P1001: Can't reach database server at postgres:5432 0.776 0.776 Please make sure your database server is running at postgres:5432.

failed to solve: process "/bin/sh -c npx prisma db push" did not complete successfully: exit code: 1``

tushar5526 commented 12 months ago

Hey @glours thanka for the awesome product, this got fixed I suppose a few weeks after I faced this issue, maybe in one of the docker updates. Not encountered the issue anywhere else so far.

milas commented 11 months ago

@RockingThor Compose services/dependencies determine the order services are launched with compose up. Your image build for app is trying to use Postgres, which won't be available - even if the service is running, the build sandbox won't have access to it. That's not something that's supported by Compose.

--

Closing this issue - as mentioned, the original cause of these issues and other sporadic networking problems was resolved in #10778, so the solution is to make sure you're on the latest version of Docker Desktop (or Compose itself if you manually installed it).

docker / compose

[BUG] docker compose up failing to connect to postgres container on Apple Silicon Macs #10673

Description

Current Behavior

Steps To Reproduce

Compose Version

Docker Environment

Anything else?