firecow / gitlab-ci-local

Tired of pushing to test your .gitlab-ci.yml?
MIT License
2.28k stars 128 forks source link

Inability to work with CI processes that use nested Docker containers #1282

Open ttsiodras opened 3 months ago

ttsiodras commented 3 months ago

Minimal .gitlab-ci.yml illustrating the issue

---
job:
  stage: build
  image: DockerImageA
  script:
    - do thing 1 inside DockerImageA
    - do thing 2 inside DockerImageA
    - docker run -v $CI_PROJECT_DIR:$CI_PROJECT_DIR -e CI_PROJECT_DIR --rm -it DockerImageB /bin/bash -c 'cd $CI_PROJECT_DIR ; do stuff inside DockerImageB'

This works fine if the gitlabrunner's config.toml has the "/builds" folder and the "/var/run/docker.sock" inside the "volumes" key for the "runners.docker". The "/builds" is basically in the host filesystem, so the nested docker can access the CI_PROJECT_DIR just as much as the initial (DockerImageA) could.

Expected behavior Expected behavior is to execute commands from inside DockerImageB that can see the CI_PROJECT_DIR.

Host information Not really relevant. It's an issue with how host-based folders need to be used (bind-mounted) for this to function.

Containerd binary Docker.

Additional context I did try passing --volume /gcl-builds but I got an error that this folder is already mapped. Indeed it is, with a transient mapping; one that can't allow the nested Docker invocation to map it further inside the container made for DockerImageB.

firecow commented 3 months ago

Can you refactor your example, so it's actually able to run on our machines, so we can get a better understanding of what the issue is?

ttsiodras commented 3 months ago

Sure - here's a set of steps:

$ cat Dockerfile
#
# Process this Dockerfile with:
#
#     docker build -t docker_image_a .
#
FROM debian:bookworm
RUN apt-get update && \
    apt-get -qy full-upgrade && \
    apt-get install -qy curl && \
    curl -sSL https://get.docker.com/ | sh

$ docker build -t docker_image_a .
$ docker pull debian:bookworm
$ cat  .gitlab-ci.yml
stages:
  - build

build-something:
  stage: build
  image: docker_image_a
  script:
    - pwd
    - touch a
    - touch b
    - ls -la # This shows all files; including .gitlab-ci.yml and 'a' and 'b'
    - docker run -v $PWD:/work --rm debian:bookworm /bin/bash -c "ls -la /work" # This, doesn't
$ gitlab-ci-local --privileged --volume /var/run/docker.sock:/var/run/docker.sock
Using fallback git commit data
Unable to retrieve default remote branch, falling back to `main`.
Using fallback git remote data
parsing and downloads finished in 59 ms.
json schema validated in 182 ms
build-something starting docker_image_a:latest (build)
build-something copied to docker volumes in 685 ms
build-something $ pwd
build-something > /gcl-builds
build-something $ touch a
build-something $ touch b
build-something $ ls -la
build-something > total 16
build-something > drwxrwxrwx 2 root root 4096 Jul  5 09:45 .
build-something > drwxr-xr-x 1 root root 4096 Jul  5 09:45 ..
build-something > -rw-rw-rw- 1 root root  294 Jul  5 09:42 .gitlab-ci.yml
build-something > -rw-rw-rw- 1 root root  267 Jul  5 09:43 Dockerfile
build-something > -rw-r--r-- 1 root root    0 Jul  5 09:45 a
build-something > -rw-r--r-- 1 root root    0 Jul  5 09:45 b
build-something $ docker run -v $PWD:/work --rm debian:bookworm /bin/bash -c "ls -la /work"
build-something > total 8
build-something > drwxr-xr-x 2 root root 4096 Jul  5 09:42 .
build-something > drwxr-xr-x 1 root root 4096 Jul  5 09:45 ..
build-something finished in 2 s

So, the nested docker instance (debian:bookworm) is asked to map the current folder to /work; but this doesn't work.

In the Gitlab installation it does work, because the config.toml there contains a volume directive that asks for the entire /builds folder that exists on the host to be mapped to the /buids inside the build container _(the docker_imagea in the example above)_. This means that it can be mapped forward to the next level nested Docker container.

What can I do to get gitlab-ci-local to end up listing the same contents as the first level container does?

Passing --volume /gcl-builds doesn't work, since /gcl-builds is already asked to be mapped transiently (--volume gcl-build-something-837138-build:/gcl-builds below)

$ gitlab-ci-local --privileged --volume /var/run/docker.sock:/var/run/docker.sock --volume /gcl-builds:/gcl-builds
Using fallback git commit data
Unable to retrieve default remote branch, falling back to `main`.
Using fallback git remote data
parsing and downloads finished in 57 ms.
json schema validated in 177 ms
build-something starting docker_image_a:latest (build)
build-something copied to docker volumes in 668 ms
Error: Command failed with exit code 1: docker create --interactive  --privileged --user 0:0 --volume gcl-build-something-837138-build:/gcl-builds --volume gcl-build-something-837138-tmp:/tmp/gitlab-ci-local-file-variables-fallback.group-fallback.project-837138 --workdir /gcl-builds --volume /var/run/docker.sock:/var/run/docker.sock --volume /gcl-builds:/gcl-builds   -e 'FF_DISABLE_UMASK_FOR_DOCKER_EXECUTOR=false' \
  -e 'CI=true' \
ttsiodras commented 1 month ago

This is still labeled as "elaborate". Is the example I gave above sufficient, or do you need me to provide some additional information about the issue?