GoogleContainerTools / kaniko

Build Container Images In Kubernetes
Apache License 2.0
14.8k stars 1.44k forks source link

File hard link is removed during COPY #2594

Open linusyong opened 1 year ago

linusyong commented 1 year ago

Actual behavior The hard link files' data storage during COPY is replicated, making the resulting container image much larger.

Expected behavior The COPY operation should replicate the hard link with a single copy of underlying storage.

To Reproduce Steps to reproduce the behavior:

  1. Create a Dockerfile

    FROM bitnami/git:2.41.0-debian-11-r6 as builder
    
    FROM gcr.io/distroless/python3-debian11:debug as release
    COPY --from=builder /opt/bitnami/git /opt/bitnami/git
    COPY --from=builder /usr/lib/x86_64-linux-gnu/libcurl.so.4 /usr/lib/x86_64-linux-gnu/libcurl.so.4
    COPY --from=builder /usr/lib/x86_64-linux-gnu/libnghttp2.so.14 /usr/lib/x86_64-linux-gnu/libnghttp2.so.14
    
    ENTRYPOINT ["/opt/bitnami/git/bin/git"]
  2. Perform build with docker build
    docker build . \
      -f Dockerfile \
      --no-cache \
      -t test-docker-build
  3. Perform a build with kaniko

    docker run --rm \
      -v $(pwd):/workdir \
      gcr.io/kaniko-project/executor \
      --dockerfile /workdir/Dockerfile \
      --context /workdir \
      --snapshot-mode=redo \
      --destination=test-kaniko-build \
      --tar-path=/workdir/test-kaniko-build.tar \
      --no-push
    
    docker load -i test-kaniko-build.tar
  4. The size of the container image is very different:
    $ docker image ls
    REPOSITORY                            TAG                        IMAGE ID       CREATED             SIZE
    test-kaniko-build                     latest                     b99acbb75371   2 minutes ago       720MB
    test-docker-build                     latest                     ec561bff1769   5 minutes ago       83MB

Additional Information

This mutlistage built base on bitnami/git:2.41.0-debian-11-r6 (sha256:120c692378b9ddb77a895271bdda03d0f1b6d1fbf16f991c8d73c73a71b4d2a4). If we check the /opt/bitnami/git/libexec/git-core/git file, it is a hard link file:

$ docker run --rm --entrypoint=sh bitnami/git:2.41.0-debian-11-r6 -c "ls -l /opt/bitnami/git/libexec/git-core/git"
-rwxr-xr-x 141 root root 4440104 Jun  1 09:59 /opt/bitnami/git/libexec/git-core/git

$ docker run --rm --entrypoint=sh bitnami/git:2.41.0-debian-11-r6 -c "ls -i /opt/bitnami/git/libexec/git-core/git"
17566587 /opt/bitnami/git/libexec/git-core/git

$ ## Use the inode number returned above (`17566587`) to find number of hard links.  Remove `| wc -l` to see all hard links
$ docker run --rm --entrypoint=sh bitnami/git:2.41.0-debian-11-r6 -c "find /opt/bitnami/git/libexec/ -inum 17566587 | wc -l"
137

Using the docker build built container image, it demonstrate the same behavior:

$ docker run --rm --entrypoint=sh test-docker-build -c "ls -l /opt/bitnami/git/libexec/git-core/git"
-rwxr-xr-x  141 root     root       4440104 Jun  1 09:59 /opt/bitnami/git/libexec/git-core/git

$ docker run --rm --entrypoint=sh test-docker-build -c "ls -i /opt/bitnami/git/libexec/git-core/git"
18350115 /opt/bitnami/git/libexec/git-core/git

$ docker run --rm --entrypoint=sh test-docker-build -c "find /opt/bitnami/git/libexec/ -inum 18350115 | wc -l"
137

Using Kaniko built container image, the hard link is removed with the inode referencing to a single copy of data

$ docker run --rm --entrypoint=sh test-kaniko-build -c "ls -l /opt/bitnami/git/libexec/git-core/git"
-rwxr-xr-x    1 root     root       4440104 Jun 23 02:26 /opt/bitnami/git/libexec/git-core/git

$ docker run --rm --entrypoint=sh test-kaniko-build -c "ls -i /opt/bitnami/git/libexec/git-core/git"
18350263 /opt/bitnami/git/libexec/git-core/git

$ docker run --rm --entrypoint=sh test-kaniko-build -c "find /opt/bitnami/git/libexec/ -inum 18350263 | wc -l"
1

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
  • - [ ]
Please check if the build works in docker but not in kaniko
  • - [ ]
Please check if this error is seen when you use --cache flag
  • - [ ]
Please check if your dockerfile is a multistage dockerfile
  • - [X]
TobiX commented 1 year ago

"Hard link detection is hard."[tm]

For reference:

As a workaround, you could tar the directory in the first stage, copy that archive and untar in the next stage (yes, I know, not pretty)