firecow / gitlab-ci-local

Tired of pushing to test your .gitlab-ci.yml?
MIT License
2.15k stars 125 forks source link

error during connect: Get "https://docker:2376/v1.24/containers/json?all=.." tls: failed to verify certificate: x509: certificate signed by unknown authority #1113

Closed david-nano closed 5 months ago

david-nano commented 6 months ago

TL;DR - same job template, each one use different docker-compose.yaml, one is working fine, and one is getting errors during DIND injection.

Minimal .gitlab-ci.yml illustrating the issue

---
Component Test out_of_disk:
  stage: component_test
  image: docker:24.0.2-git
  services:
  - name: registry.hub.docker.com/library/docker:24.0.2-dind
    alias: docker
  before_script:
  - docker-compose ${ENV_FILE:+--env-file $ENV_FILE} -f "$TEST_COMPOSE_PATH" $(test
    -f "$NETWORK_COMPOSE_PATH" && echo "-f $NETWORK_COMPOSE_PATH") up -d --force-recreate
    && sleep 40
  script:
  - |
    docker run  -v ${CI_PROJECT_DIR}:${CI_PROJECT_DIR} \
      test-server:latest /bin/bash -c \
        "pytest -v ${CI_PROJECT_DIR}/$PATH_TO_TEST_FILS"

Expected behavior getting test working

Host information Ubuntu 22.04 gitlab-ci-local 4.46.1

Containerd binary docker

Additional context I have two jobs which use the same template job, one is using docker compose A, and the other docker compose B. The one that use compose A, is finish successfully, the other one fail with error:

Component Test out_of_disk $ docker-compose ${ENV_FILE:+--env-file $ENV_FILE} -f "$TEST_COMPOSE_PATH" $(test -f "$NETWORK_COMPOSE_PATH" && echo "-f $NETWORK_COMPOSE_PATH") up -d --force-recreate && sleep 40
Component Test out_of_disk > error during connect: Get "https://docker:2376/v1.24/containers/json?all=1&filters=%7B%22label%22%3A%7B%22com.docker.compose.config-hash%22%3Atrue%2C%22com.docker.compose.project%3Ddocker%22%3Atrue%7D%7D": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "docker:dind CA")

Compose a:

version: "3.5"
services:
  redis-db:
    image: redislabs/rejson:2.0.11
    container_name: redis-db
    restart: unless-stopped
    ports:
      - 6379:6379
    networks: 
      - runner

  garbage-collector:
    image: ${DOCKER_DOWNLOAD_NEXUS_REGISTRY}/app/garbage_collector/garbage_collector:v0.1.98
    container_name: garbage-collector
    restart: unless-stopped
    environment:
      DC_HOME: /dc_home
    volumes:
      - ${DC_HOME}:/dc_home
    ports:
      - 60000:60000
    networks:
      - runner

  runner:
    image: ${DOCKER_DOWNLOAD_NEXUS_REGISTRY}/app/runner/pyrunner-generic-cpu:${VERSION}
    container_name: runner
    restart: unless-stopped
    environment:
      DC_HOME: /dc_home
    volumes:
      - ${CI_PROJECT_DIR}/test/configurations/component_test_configuration.ini:/configuration.ini:ro
      - /netapp:/netapp
      - /tmp:/tmp:rw
      - ${CI_PROJECT_DIR}/docker/algorithmic_solutions_list_mock.txt:/home/scripts/algorithmic_solutions_list.txt
      - ${CI_PROJECT_DIR}/docker/deployment_settings_list.txt:/home/deployments/deployment_settings_list.txt
      - ${DC_HOME}:/dc_home
    ports:
      - 8080:8080
    networks: 
      - runner

networks:
  runner:

compose b:

version: "3.5"
services:
  redis-db:
    image: redislabs/rejson:2.0.11
    container_name: redis-db
    restart: unless-stopped
    ports:
      - 6379:6379
    networks: 
      - runner
  garbage-collector:
    image: ${DOCKER_DOWNLOAD_NEXUS_REGISTRY}/app/garbage_collector/garbage_collector:v0.1.98
    container_name: garbage-collector
    restart: unless-stopped
    environment:
      DC_HOME: /dc_home
    volumes:
      - ${DC_HOME}:/dc_home
    ports:
      - 60000:60000
    networks:
      - runner

  runner:
    image: ${DOCKER_DOWNLOAD_NEXUS_REGISTRY}/app/runner/pyrunner-generic-cpu:${VERSION}
    container_name: runner
    restart: unless-stopped
    environment:
      DC_HOME: /dc_home
    volumes:
      - ${CI_PROJECT_DIR}/test/configurations/component_test_configuration.ini:/configuration.ini:ro
      - /netapp:/netapp
      - /tmp:/tmp:rw
      - data-storage-vol:/tmp/data/jobs/component_test/job_123456:rw
      - ${CI_PROJECT_DIR}/docker/algorithmic_solutions_list_mock.txt:/home/scripts/algorithmic_solutions_list.txt
      - ${CI_PROJECT_DIR}/docker/deployment_settings_list.txt:/home/deployments/deployment_settings_list.txt
      - ${DC_HOME}:/dc_home
    ports:
      - 8080:8080
    networks: 
      - runner

volumes:
  data-storage-vol:
    driver_opts:
      type: "tmpfs"
      device: "tmpfs"
      o: "size=${RAM_DRIVE_SIZE:?err},uid=1000"

networks:
  runner:

My .gitlab-ci-local-env file:

PRIVILEGED=true
ULIMIT=8000:16000
VOLUME="/etc/docker/daemon.json:/etc/docker/daemon.json certs:/certs/client /netapp:/netapp"
VARIABLE="DOCKER_TLS_CERTDIR=/certs DOCKER_BUILDKIT=0 COMPOSE_DOCKER_CLI_BUILD=0"
firecow commented 6 months ago

Please, get rid of as many factors and yaml as possible.

This is not an as simple as possible example at all 😁

It makes it overly difficult for others to debug.

david-nano commented 6 months ago

Please, get rid of as many factors and yaml as possible.

This is not an as simple as possible example at all 😁

It makes it overly difficult for others to debug.

Is it better now?

firecow commented 6 months ago
---
test-job:
  image: docker:24.0.2-git
  services:
    - name: registry.hub.docker.com/library/docker:24.0.2-dind
      alias: docker
  script:
    - docker version

Using this .gitlab-ci-local-env

PRIVILEGED=true
ULIMIT=8000:16000
VOLUME="/etc/docker/daemon.json:/etc/docker/daemon.json certs:/certs/client /netapp:/netapp"
VARIABLE="DOCKER_TLS_CERTDIR=/certs DOCKER_BUILDKIT=0 COMPOSE_DOCKER_CLI_BUILD=0"

I'm not seeing the symptom, when using a simple example.

https://gitlab.com/firecow/gitlab-ci-debugging/-/jobs/6254313223

image

Strip down your example, line by line, instruction by instruction, and eventually I'm sure we will get to the bottom of this.

firecow commented 5 months ago

I would be nice to add some sort of hint for others that might end up in a similar situation. What was your mistake?

david-nano commented 5 months ago

I would be nice to add some sort of hint for others that might end up in a similar situation. What was your mistake?

Not reproducible :( The next business day, everything worked, not sure why.