GoogleContainerTools / kaniko

Build Container Images In Kubernetes
Apache License 2.0
14.77k stars 1.44k forks source link

Mount cache directory inside build container #969

Open urbaniak opened 4 years ago

urbaniak commented 4 years ago

Trying to figure out some way of sharing cache between builds, though about mounting some directory like /cache inside the build container, so we can then have a shared cache there for things like pip, npm or cargo.

Will be possible to implement something like that?

tejal29 commented 4 years ago

We do have a cache warmer feature for caching base layers. https://github.com/GoogleContainerTools/kaniko#caching

The intermediate layers are caches remotely. Are you looking for caching layers locally?

Would that help?

urbaniak commented 4 years ago

Layer cache is invalidated on any change so wanted to mount shared directory and keep for example python pip cache there, so it doesn't require to fetch that from network even one dependency has changed.

Something like -v for docker to mount a directory/image (which is also not supported for docker build)

cvgw commented 4 years ago

@urbaniak going on the example of pip caching (which I don't know a ton about) I'm assuming pip looks for certain directories to check if a cache already exists. I would imagine you could just mount a volume at that directory into the kaniko docker container.

IIRC kaniko has some special handling for mounted volumes so I don't think it would cause issues for the image build if those cache files aren't directly referenced by the docker build, but I'm not positive.

In any case, I see no reason why it shouldn't work so if it doesn't we can certainly look into it

victornoel commented 4 years ago

I second this need, what you explained @cvgw is correct:

When using kaniko, the build itself has no access to the host filesystem, so it would be neat to be able to instruct kaniko to mount a certain directory into the build container so that pip and co can take advantage of it.

glen-84 commented 4 years ago

What about supporting RUN --mount=type=cache?

victornoel commented 4 years ago

@glen-84 I'm not exactly sure how this feature interact with the host filesystem? Because the use case I had in mind take advantage of the fact that the host filesystem (i.e., the CI machine) will have a directory with the cache available. I'm not sure if --mount=type=cache allows for this, does it?

glen-84 commented 4 years ago

@victornoel,

If I understand correctly, the Docker engine on the host would manage the directory (sort of like a named volume). If you follow the link, they show an example of caching Go packages.

See also https://stackoverflow.com/a/57934504/221528.

(PS. You might want to consider up-voting this issue to indicate your interest.)

victornoel commented 4 years ago

@glen-84 ok, so in the case of kaniko, since there are no docker engine on the host, I suspect it would mean adding an option to the kaniko cli to specify which directory to use when --mount=type=cache is used inside the image. This would elegantly allow to choose the desired directory in a CI context.

Still there would be some thinking to do as to how this interacts with the from and source option I suppose...

glen-84 commented 4 years ago

My thoughts were to "reuse" the syntax, as opposed to designing something completely specific to Kaniko. Kaniko could be updated to understand these types of mounts, and to manage the directories outside of the image (i.e. inside the Kaniko container).

victornoel commented 4 years ago

@glen-84 yes, that was my point too :)

foosinn commented 3 years ago

podman has support for this using the well known -v flag:

podman build -v /tmp/cache:/cache .

Using this with ENV COMPOSER_CACHE_DIR=/cache/composer_cache in PHP and ENV YARN_CACHE_FOLDER=/cache/yarn_cache saves us a ton of time and bandwidth.

I would love to see this supported in kaniko.

hermanbanken commented 3 years ago

I've indicated my interest.

Locally we build using Docker (it is just easier, sorry), for use in a e2e test using Docker Compose. Using --mount=type=cache speeds up incremental Golang builds enormously.

Contrarily, on CI/CD we build using Kaniko before we ship. However, if our Dockerfile specifies this experimental syntax, it breaks the Kaniko build:

error building image: parsing dockerfile: Dockerfile parse error line 12: Unknown flag: mount

Therefore, to use BuildKit, I need to maintain 2 separate Dockerfiles, one for Kaniko and one for Docker BuildKit. This is cumbersome. A great first step would be if Kaniko didn't choke on the syntax, and adding support would be even greater!

lcbm commented 3 years ago

At our project, we're also interested in this.

We have a similar scenario as @hermanbanken: we build using Docker locally and build with Kaniko before we ship, in our CI/CD pipeline.

In our case, however, we use buildkit for the --secret feature, instead of cache (Dockerfile at the bottom of the comment). We need that in order to to pass secret information (sensitive credentials) to be used in the Dockerfile for building docker images (more specifically, download our private packages) in a safe way that will not end up stored in the final image.

As @hermanbanken said, It'd be great if Kaniko didn't choke on the syntax -- also suggested in #1568 (comment) -- and even better if it supported it. This is a very relevant issue for us and I can imagine that there are a lot more scenarios that would benefit from this :smile:

Dockerfile

# syntax=docker/dockerfile:1.2
FROM python:3.8-slim

# ...

COPY requirements_private.txt .
RUN --mount=type=secret,id=secrets \
  export $(cat /run/secrets/secrets) && \
  pip install --no-cache-dir --quiet --no-deps -r requirements_private.txt

# ...
giulioprovasi commented 3 years ago

also interested in this feature, that's what's missing for me to move from buildkit to kaniko atm

alexpts commented 3 years ago

Also interested in this feature, this will speed up our builds

dimovnike commented 3 years ago

This is a must have feature. At minimum it should not err when buildkit flags are present. Right now i use a step that runs sed on the dockerfile to remove those options but I can not do this from skaffold (skaffold generates a single step).

tejal29 commented 3 years ago

Hey folks, kaniko is in maintenance mode and I am not actively working on it. That said, I am open for to help folks get this feature in.

nahum-litvin-hs commented 3 years ago

im also very interested in this feature.

gianluca-venturini commented 3 years ago

Mounting a local directory as a volume inside Kaniko is a valid use case and can be achieved with a little bit of overhead.

In Google Cloud Build the working directory is already automatically mounted in the Kaniko (and any other container) under /workspace see here, you can reference directly the /workspace directory and access your cache.

If you want to mount additional directories (e.g. because you want the cache in a specific location) you can always mount a volume in Kaniko, for example with:

- name: 'gcr.io/kaniko-project/executor:latest'
  args: [...]
  volumes:
  - name: 'build_cache'
    path: '/my_app/build_cache'

at this point we just need to pre-populate the volume with the cache content, for example downloading it from a bucket:

- name: gcr.io/cloud-builders/gsutil
  args: ['-m', 'cp', '-r', 'gs://my-bucket/build_cache/*', '/build_cache']
  volumes:
  - name: 'build_cache'
    path: '/build_cache'

or from a local directory:

- name: bash
  args: ['cp', '-a', '/workspace/build_cache/.', '/build_cache']
  volumes:
  - name: 'build_cache'
    path: '/build_cache'

Hope this helps until Cloud Build will allow to mount a local directory as a volume (maybe it already does, but I wasn't able to find a way).

byrnedo commented 2 years ago

Hey folks, kaniko is in maintenance mode and I am not actively working on it. That said, I am open for to help folks get this feature in.

@tejal29 I'm a bit motivated to do something regarding this but I'm a bit lost in the code base. I had a quick look around an my naive thought is that one would could just remove the mount syntax from the parsed command for now in pkg/commands/run.go. How close or wrong am I?

guillaume-d commented 2 years ago

FYI Buildah >= 1.24 (which is shipped with Podman >= 4) supports RUN --mount=type=cache.

anthonyalayo commented 1 year ago

Is there a solution out for this? Looks like it's been a year since the last commenter, but I can't imagine why there wouldn't be support for cache mounts?

https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/reference.md#run---mounttypecache

trevorlauder commented 1 year ago

I was able to get around this limitation using ONBUILD in the Dockerfile. This example is creating and using a cache for Poetry inside the container and then copying it into the GitLab project root so it can be uploaded to the remote cache. Subsequent builds pull it down to the project root so it can be uploaded as part of the docker context and copied into the container. By default it won't use the cache so it doesn't break when running locally and in GitLab I pass in BUILD_ENV=ci as a build arg which causes it to copy the cached directory into the container.

ARG BUILD_ENV=local

# Install poetry and setup environment
# This is done in an upstream stage because the ONBUILD COPY will be inserted directly after
# the FROM of the downstream build and we don't want to have to re-install poetry every time
# the cache is downloaded (every build).
# see https://docs.docker.com/engine/reference/builder/#onbuild
FROM python:3.7-slim-bullseye as poetry

ENV POETRY_VERSION=1.5.0

RUN pip install poetry==${POETRY_VERSION}

ENV VIRTUAL_ENV /venv
RUN python -m venv ${VIRTUAL_ENV}
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"

ENV POETRY_NO_INTERACTION=true \
    POETRY_VIRTUALENVS_IN_PROJECT=false \
    POETRY_VIRTUALENVS_PATH=${VIRTUAL_ENV} \
    POETRY_VIRTUALENVS_CREATE=false

# If running on CI, copy the poetry cache from the GitLab project root to the container
FROM poetry as poetry_ci

ONBUILD COPY .cache/pypoetry /root/.cache/pypoetry/

# If running on local, don't do anything
FROM poetry as poetry_local

# Install the project
FROM poetry_${BUILD_ENV} as venv

COPY pyproject.toml poetry.lock ./

RUN touch README.md && \
    poetry install --only main

# Build final image
FROM python:3.7-slim-bullseye as final

ENV PATH="/venv/bin:${PATH}"

COPY --from=venv /venv /venv

# Copy in the app, set user, entrypoint, etc

In GitLab:

build:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  variables:
    POETRY_CACHE_DIR: /root/.cache/pypoetry
    PROJECT_POETRY_CACHE_DIR: ${CI_PROJECT_DIR}/.cache/pypoetry
  cache:
    - key: ${CI_JOB_NAME}
      paths:
        - ${PROJECT_POETRY_CACHE_DIR}
  script:
    - mkdir -p ${PROJECT_POETRY_CACHE_DIR}
    - /kaniko/executor --skip-unused-stages=true --cache=true --context ${CI_PROJECT_DIR} --dockerfile ${CI_PROJECT_DIR}/Dockerfile --destination <destination> --build-arg BUILD_ENV=ci --ignore-path ${POETRY_CACHE_DIR}
    - rm -rf ${PROJECT_POETRY_CACHE_DIR}
    - cp -a ${POETRY_CACHE_DIR} ${PROJECT_POETRY_CACHE_DIR}

The --ignore-path is needed if you use multi-stage builds and the path you want to cache isn't in the final target that's built. If you're not using multi-stage builds, I don't think you need to pass that since the path will be available in the filesystem after kaniko has finished running but then you'd have the cache in your image which you probably don't want. The files are then copied to the project root which is where it needs to be for GitLab to be able to cache it.

TinoSM commented 1 year ago

Another alternative that we are using for per-process caching which doesn't require the cache to be filled in advance (however it does not work for the first install of each project, but we are ok with that, it can be mixed with @trevorlauder approach).

We just store somewhere the last image built (SSM parameters) for each project. And then using multistage copy we just copy the cache folder (poetry in our case) from the latest-known valid image (in our case we just copy the venv to speed it up even more).

PREVIOUS_IMAGE can be scratch or the "name" of the previous built image for that project.

FROM --platform=linux/amd64 $PREVIOUS_IMAGE as previous_image

RUN mkdir -p /project/.venv
COPY .aws /root/.aws
COPY . /project_new
FROM --platform=linux/amd64 $PYTHON_BASE_IMAGE as base_image

COPY --from=previous_image /root/.aws /root/.aws
COPY --from=previous_image /project_new /project
# Restore the project folder from the previous image to get its venv
COPY --from=previous_image /project/.venv /project/.venv
giepa commented 9 months ago

Need support for -mount=type=cache as well

SilentCatD commented 7 months ago

Any update on this?

IndranilNandy commented 4 months ago

This is a must needed feature still missing, which is making it hard to choose Kaniko.

pe224 commented 4 months ago

I don't think we actually need to copy the cache folder explicitly as in @trevorlauder's approach above. Two realizations were key for me:

  1. Since the executor binary is simply running as a process inside the host container without any redirection, it has, by default, access to every folder in the host container. AFAIU the /kaniko folder is only special in the sense that it is not added to the final image. But you are free to create any additional folders in the host container for the build and then discard them from the final image with --ignore-path
  2. kaniko automatically discards mounted folders on the host from the final image (via DetectFilesystemIgnoreList). In the case of Gitlab CI, the cache: directive mounts the cache into the provided subfolder within ${CI_PROJECT_DIR}, so we don't even need to explicitly --ignore-path it.

All in all, to make it work, just tell Gitlab CI to mount the cache as usual into the host container, then use it inside the Dockerfile during build (I am using build args to pass the cache path). It will be accessible because of 1., and it will not go into the final image because of 2. If it is read-write (default), the Gitlab-managed cache will also automatically be updated afterwards.

Here is my stripped-down config for employing a pip cache during kaniko build within Gitlab CI.

gitlab-ci.yml

build_and_publish_container:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:v1.23.1-debug
    entrypoint: [""]
  cache:
    paths:
      - .cache/pip
  script:
    # build and push container, pass cache folder as build arg
    - /kaniko/executor
      --context "${CI_PROJECT_DIR}"
      --build-arg "PIP_CACHE_DIR=${CI_PROJECT_DIR}/.cache/pip"
      --destination "${CI_REGISTRY_IMAGE}:latest"

Dockerfile

FROM python:3.12-slim

# Path to pip cache on host container
ARG PIP_CACHE_DIR

COPY requirements.txt /app/
WORKDIR /app
RUN pip install -r requirements.txt

For compatibility with local docker builds, I believe one could add --mount=type=cache to the RUN pip install command - it will be ignored by kaniko. (untested)

premonts commented 4 months ago

Thanks to @pe224 i followed his approach and it worked.

Let me add one more thing when you source code is located in the root of ${CI_PROJECT_DIR} for gitlab and you also have a test job for instance before the build job and you want m2 cache to be shared between jobs.

if you do something like this for a java maven project

ARG CI_PROJECT_DIR
COPY . /app
WORKDIR /app
RUN mvn clean package -Dmaven.repo.local=${CI_PROJECT_DIR}/.m2/repository

Then you will also copy the m2 cache in a docker layer and not use the host one this means it will not be updated on the host and not cached by gitlab runner.

What i did on my side is move my code to code folder and do this instead

ARG CI_PROJECT_DIR
COPY code/ /app
WORKDIR /app
RUN mvn clean package -Dmaven.repo.local=${CI_PROJECT_DIR}/.m2/repository

if you don't want to have code folder i assume you can still do something like this

ARG CI_PROJECT_DIR
COPY pom.xml settings.xml /app
COPY src/ /app/src
WORKDIR /app
RUN mvn clean package -Dmaven.repo.local=${CI_PROJECT_DIR}/.m2/repository

This way the cache stays on the host and will be updated during the build.