Open dfyz011 opened 1 year ago
I can reproduce on Ubuntu 22.04 with gitlab-ci-local --version 4.41.2
Does this happen on real gitlab runners, if both jobs execute on the same runner at the same time? @dfyz011
Experiencing the same in gitlab-ci-local
, so far I have not experienced the same on regular runners.
I have a pipeline that runs 10 jobs in one stage and builds container images in each using docker:20-dind
.
I have two runners, of which each takes 4 jobs in parallel and have not encountered the problem.
Yet, when running it in gitlab-ci-local, the build fails with:
Server:
ERROR: error during connect: Get https://docker:2376/v1.40/info: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "docker:dind CA")
errors pretty printing info
for each job but the first.
Output from docker info
if you happen to need it:
Client:
Debug Mode: false
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 20.10.24
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 2806fc1057397dbaeefbea0e4e17bddfbd388f38
runc version: v1.1.5-0-gf19387a6
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
cgroupns
Kernel Version: 6.1.0-10-amd64
Operating System: Alpine Linux v3.18 (containerized)
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 31.13GiB
Name: 7ff8585a28eb
ID: RKUC:GWY6:7HE3:J7XQ:3CI4:YXJI:TKIE:KSGG:SCYV:KS6A:GAAO:GZ4P
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
It's because the ci job's share the same certs folder that dind needs.
How can we mitigate here? I'd love to invest some time here but I lack the knowledge how the general setup is working, like, where is the command line generated with the volume mounts etc. Having a detailed debug log what docker commands are being issued would work wonders here.
I suspect, we need to mount a separate cert dir per concurrent job? If so, why? The root cert should not change, should it?
Addendum: maybe we can set --concurrency
implicitely to 1 if we have dind jobs? Ofc logging also to stderr that we do so.
It seems like each job/service pairs needs it's own certs mount. But i can't figure why that would be needed, and how remote gitlab-runners mitigate this "race condition"
Any update on this? Is there anything I can do to mitigate this issue? It is too slow to run the jobs one by one.
Minimal .gitlab-ci.yml illustrating the issue
Expected behavior concurrent jobs runs ok (same as in gitlab-ci)
Host information Macos 13.4 (22F66) gitlab-ci-local 4.41.2
Additional context Hello. When I try to run more that 1 job at the same time (stage).
where in each stage i need to do "docker login". Only one job will login. success and other will fail with this error:
error during connect: Post "https://docker:2376/v1.24/auth": tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "docker:dind CA"
But run them separate is ok. Here is my
.gitlab-ci-local-env
as in your dind exampleHere is my
.gitlab-ci-local-variables.yml
(i use registry from gitlab, you need to provide your user and deploy-token)Thank you!