GoogleContainerTools / kaniko

Build Container Images In Kubernetes
Apache License 2.0
14.25k stars 1.4k forks source link

Kaniko build's performance much slower comparing with DID solution #875

Open caiwei-ebay opened 4 years ago

caiwei-ebay commented 4 years ago

We have a very simple Dockerfile which inherits a ubuntu jdk 8 image, run a few shell commands and copy a few files. Please note the RUN commands comes at the very first.

Our CI is built on top of Kubernetes, the Jenkins build will be run in a slave pod. We've enabled DID & Kaniko in separate slave images and trigger the builds with Kaniko and Docker. Here is the performance result of building & pushing images we've observed:

Dockerfile by removing all RUN commands:

Dockerfile having 10 RUN commands:

May I know why Kaniko is so much slower than DID solution if there are RUN commands in Dockerfile? Can this part speed up?

We've tried the --cache & the --cache-repo parameters, the performance of Kaniko build did not improve at all. Here is the details:

However the performance is much worse with cache, taking 254s. I think the cache uploading or downloading is also a time killer.

Please help explain the cache issue and advice how we can further improve the performance for Kaniko build.

The Dockerfile we used likes below:


FROM abc COPY *.jar /app/app.jar

RUN jar -xvf app.jar && \ rm -rf app.jar && \ mkdir -p /layer_build/lib/snapshots && \ mkdir -p /layer_build/lib/releases && \ mkdir -p /layer_build/app && \ find BOOT-INF/lib -name 'SNAPSHOT' -type f -exec mv {} /layer_build/lib/snapshots \; && \ mv BOOT-INF/lib/ /layer_build/lib/releases && \ rm -rf BOOT-INF/lib && \ mv /layer_build/app

FROM def COPY --from=0 layer_build/lib/snapshots/ /app/BOOT-INF/lib/ COPY --from=0 layer_build/lib/releases/ /app/BOOT-INF/lib/ COPY --from=0 layer_build/app/ /app/

WORKDIR /app CMD ["/bin/bash", "-c", "/app/bin/run.sh"]


mcfedr commented 4 years ago

I've noticed similar issues - I use GitLab runner on Kubernetes, and in the same way as you described, ran dind and kaniko at the same time, kaniko is much slower. At the moment I've switched to using kaniko on Cloud Build, and there its pretty fast and caches better than docker.

caiwei-ebay commented 4 years ago

kaniko on Cloud Build

Thanks for the information, I believe you are talking about https://cloud.google.com/blog/products/application-development/build-containers-faster-with-cloud-build-with-kaniko.

Unfortunately we are using an internal docker registry based on quay.io, so it cannot benefit us. The cache uploading & downloading with quay takes much more time than without using cache as we observed.

consideRatio commented 4 years ago

It seems a lot of time is spent snapshotting the filesystem, which I believe is used to ensure we get an end result with multiple layers.

By using --single-snapshot there will only be a single layer added to the base image, and I assume we won't slow down making intermediary snapshots for intermediary layers.

It can of course be nice to have layers, so improving performance like this is a compromise. I ended up with 15 minutes instead of 25 minutes for one of my builds.

u2bo commented 4 years ago

have the same question. in jenkins used dind faster than Kaniko . most of the time spent [Taking snapshot of full filesystem] [Unpacking rootfs as cmd COPY] how to improve this?

klkl0808 commented 4 years ago

most of the time spent [Taking snapshot of full filesystem] [Unpacking rootfs as cmd COPY] how to improve this?

I have the same question. I tried kaniko build on gitlab and it's also slower than with docker.

bakayolo commented 4 years ago

Same here. Trying to improve the build of https://beta.kintohub.com/ by transitioning from DiD to Kanico but DiD is faster, even with caching. Seems that most of the time is indeed spent in [Taking snapshot of full filesystem] [Unpacking rootfs as cmd COPY]

haampie commented 4 years ago

Experiencing the same issue. In fact I don't see any difference in runtimes when using --cache=true... it definitely pulls cached layers, but it does not speed up the builds at all.

bergkvist commented 4 years ago

I'm using kaniko in GitLab CI/CD with runners in a DigitalOcean Kubernetes cluster (3x 2GB 1vCPU).

Benchmark: create-react-app (multi-stage build)

FROM node:12-alpine as build
WORKDIR /home/app/
COPY package.json ./
COPY yarn.lock ./
RUN yarn 
COPY . .
RUN yarn build

FROM nginx:1.13.12-alpine
COPY --from=build /home/app/build /var/www
COPY nginx.conf /etc/nginx/nginx.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

Building locally with docker build on my laptop: ~ 2 minutes

Building with kaniko in a GitLab runner: ~ 38 minutes It spends most of the time (~32 minutes) on the "Taking snapshot of full filesystem..." step.

Same as previous with --single-snapshot: ~ 33 minutes

Using Docker in Docker: ~5 minutes

swist commented 4 years ago

We've been experiencing similar problems with kaniko in case of builds that produce a large number of small files on the filesystem in the intermediate stages. Multi-stage builds also seem to contribute to slow speed

bsmedberg-xometry commented 4 years ago

I expect the reason for this difference in speed is that "native" docker manages the layered filesystem using overlayfs (overlay2): so taking a snapshot is as simple as telling the FS driver to finish a layer. While kaniko doesn't natively track that on the filesystem, so it has to stop and stat everything in the filesystem in order to take a snapshot.

I'd be interested in whether this is a fundamental limitation of the kaniko design, or whether if you can have a user-mode file system driver or overlayfs running in the docker container running kaniko, you could obtain the matching speeds.

mayrbenjamin92 commented 4 years ago

@bsmedberg-xometry I love your explanation as I fully agree. I have just recently watched a very good talk about the "backend" of the Docker daemon in which a guy responsible for the file-system at Docker explains the differences. Whilst it sounds like possible to actually do what you have suggested, I think that it can't be achieved without changing the source-code of kaniko.

cmamigonian commented 3 years ago

I understand the filesystem snapshotting issue is driven by not using overlayfs, but what would explain the inordinate time it takes kaniko to push a layer to the cache?

tjtravelnet commented 3 years ago

We are also having this issue. Switching to Kaniko solved some other DIND issues we were having, but added 12+ minutes to our build times.

tejal29 commented 3 years ago

@tjtravelnet Did you use any of the new use-new-run flag? you can also use help us with some profiling data to understand where kaniko is spending time https://github.com/GoogleContainerTools/kaniko#kaniko-builds---profiling

Kyouuma commented 3 years ago

Build times are insanely long compared to DIND even with caching activated.

Environment:

acherifi commented 3 years ago

Same experience on my side with Kubernetes gitlab runners.

The build is a WAY longer than on my computer and I build on a pentium... Any improvments ?

jerry153fish commented 3 years ago

Has the similar issue, end up with add --snapshotMode=redo, turn all the verbose off, and filtering all the unnecessary file in .dockerignore. The result is acceptable now. From 46m to ~ 10m.

ghost commented 2 years ago

We can observe this behavior, too - but from my point of view it's not a real problem here. Of course it would be nice if the snapshot taking could be tuned, but it will never reach the performance of an overlayfs based snapshot / layer creation. So for us the best solution which works is to perform all the build work outside kaniko (no multi staged builds), build stuff in an own Gitlab job k8s container and then just copy the assembled application - with only the needed files - to the image that has to be build with Kaniko. Then the performance impacts are no problem regarding the big security benefit we get when we don't rely on DIND (which sould be forbidden in CI/CD in times of supply chain attacks..).

bhordupur commented 2 years ago

We are running the GitLab runner in AKS. Kaniko surpasses DiND for the same build job (to build docker images) with the below added flags:

--snapshotMode=redo
--use-new-run

With DiND it takes around 5,5mins and with Kaniko it comes down to 3,25mins.

halja7 commented 2 years ago

We have builds running in Kaniko that, due to the file system snapshots, are taking unacceptably long. This does not seem to have been remedied by using --use-new-run or --snapshotMode=redo individually, although using them together did substantially improve the build duration (still unacceptably long for this use-case, unfortunately). Just a +1 that this appears to remain an issue.

pdfrod commented 2 years ago

Same here. I tried used Kaniko in Google Cloud Build to get better caching behavior, but it's so slow that it's not worth it. Using --use-new-run or --snapshotMode=redo does improve things a little, but using Docker is still much faster.

I've turned my attention to Docker Buildx instead as it seems to combine the best of both worlds: fast builds and reliable caching.

rushilsrivastava commented 1 year ago

I've turned my attention to Docker Buildx instead as it seems to combine the best of both worlds: fast builds and reliable caching.

Curious, are are you using Buildx with Cloud Build?

pdfrod commented 1 year ago

Curious, are are you using Buildx with Cloud Build?

I tried to, but unfortunately my team is using GCP Container Registry and it doesn't seem to support Buildx cache artifacts.

Artifact Registry on the other hand seems to work fine with Buildx, but since it's a lot more expensive that Container Registry, I'm not sure if it's worth it for us.

salamer commented 11 months ago

same problem, any progress? I realize that this question has been open for 4 years, is there any kaniko related benchmark?

0x217 commented 10 months ago

i have the same problem

mdagost commented 9 months ago

Me too.

KamilKopaczyk commented 7 months ago

We are running the GitLab runner in AKS. Kaniko surpasses DiND for the same build job (to build docker images) with the below added flags:

--snapshotMode=redo
--use-new-run

With DiND it takes around 5,5mins and with Kaniko it comes down to 3,25mins.

If you consider using those flags, please check the docs first and proceed with caution, as using those flag may cause errors for you.

At the time of writing: --use-new-run

[...] This new run mode trades off accuracy/correctness in some cases (potential for missed files in a "snapshot") for improved performance by avoiding the full filesystem snapshots.

--snapshotMode If it runs in mode other than full, it doesn't compare e.g. file contents

ole1986 commented 5 months ago

running a Kaniko pod in a microk8s Kubernetes with setting hostNetwork: true increases the performance significantly. With that setup I reduced the time of an image creation from ~12 min to ~3 min

So there might be some firewall/network issue when host network is not exposed Of course, its not a recommended setting. But at lease I know a possible reason

amine-mokaddem commented 2 months ago

Same thing here

akimrx commented 2 months ago

the same.

upd: with such flags, it works on the same level as docker for me. decreased from 45 minutes to 8 minutes for a fairly dense image

  stage: build
  rules:
    - !reference [.master_or_web__rules, rules]
  script:
    - >-
      /kaniko/executor
      --context $CI_PROJECT_DIR/image
      --dockerfile $CI_PROJECT_DIR/image/Dockerfile
      --destination ${CI_DOCKER_IMAGE}:${CI_COMMIT_SHORT_SHA}
      --destination ${CI_DOCKER_IMAGE}:latest
      --cache=false
      --cache-repo=${CI_DOCKER_IMAGE}:latest
      --cache-ttl=1h
      --force
      --cleanup
      --single-snapshot
gimse commented 1 month ago

I also have the problem.