firecow / gitlab-ci-local

Tired of pushing to test your .gitlab-ci.yml?
MIT License
2.28k stars 128 forks source link

Concurrent jobs with docker login (dind) #918

Open dfyz011 opened 1 year ago

dfyz011 commented 1 year ago

Minimal .gitlab-ci.yml illustrating the issue

stages:
  - build

job:
  stage: build
  services:
    - docker:dind
  image: docker:latest
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_JOB_TOKEN" "$CI_REGISTRY"
job2:
  stage: build
  services:
    - docker:dind
  image: docker:latest
  script:
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_JOB_TOKEN" "$CI_REGISTRY"

Expected behavior concurrent jobs runs ok (same as in gitlab-ci)

Host information Macos 13.4 (22F66) gitlab-ci-local 4.41.2

Additional context Hello. When I try to run more that 1 job at the same time (stage).

gitlab-ci-local --cleanup --stage build

where in each stage i need to do "docker login". Only one job will login. success and other will fail with this error: error during connect: Post "https://docker:2376/v1.24/auth": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "docker:dind CA"

Снимок экрана 2023-07-03 в 07 50 17

But run them separate is ok. Here is my .gitlab-ci-local-env as in your dind example

PRIVILEGED=true
ULIMIT=8000:16000
VOLUME=certs:/certs/client
VARIABLE="DOCKER_TLS_CERTDIR=/certs"

Here is my .gitlab-ci-local-variables.yml (i use registry from gitlab, you need to provide your user and deploy-token)

CI_REGISTRY: "registry.gitlab.com"
CI_REGISTRY_USER:
CI_JOB_TOKEN:

Thank you!

firecow commented 1 year ago

I can reproduce on Ubuntu 22.04 with gitlab-ci-local --version 4.41.2

Does this happen on real gitlab runners, if both jobs execute on the same runner at the same time? @dfyz011

discordier commented 1 year ago

Experiencing the same in gitlab-ci-local, so far I have not experienced the same on regular runners.

I have a pipeline that runs 10 jobs in one stage and builds container images in each using docker:20-dind.

I have two runners, of which each takes 4 jobs in parallel and have not encountered the problem.

Yet, when running it in gitlab-ci-local, the build fails with:

Server:
ERROR: error during connect: Get https://docker:2376/v1.40/info: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "docker:dind CA")
errors pretty printing info

for each job but the first.

Output from docker info if you happen to need it:

Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 20.10.24
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 2806fc1057397dbaeefbea0e4e17bddfbd388f38
 runc version: v1.1.5-0-gf19387a6
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 6.1.0-10-amd64
 Operating System: Alpine Linux v3.18 (containerized)
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 31.13GiB
 Name: 7ff8585a28eb
 ID: RKUC:GWY6:7HE3:J7XQ:3CI4:YXJI:TKIE:KSGG:SCYV:KS6A:GAAO:GZ4P
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine
firecow commented 1 year ago

It's because the ci job's share the same certs folder that dind needs.

discordier commented 1 year ago

How can we mitigate here? I'd love to invest some time here but I lack the knowledge how the general setup is working, like, where is the command line generated with the volume mounts etc. Having a detailed debug log what docker commands are being issued would work wonders here.

I suspect, we need to mount a separate cert dir per concurrent job? If so, why? The root cert should not change, should it?

discordier commented 1 year ago

Addendum: maybe we can set --concurrency implicitely to 1 if we have dind jobs? Ofc logging also to stderr that we do so.

firecow commented 1 year ago

It seems like each job/service pairs needs it's own certs mount. But i can't figure why that would be needed, and how remote gitlab-runners mitigate this "race condition"

mkhattat commented 8 months ago

Any update on this? Is there anything I can do to mitigate this issue? It is too slow to run the jobs one by one.