docker / buildx

Docker CLI plugin for extended build capabilities with BuildKit
Apache License 2.0
3.53k stars 475 forks source link

Incorrect cache for "cat ..tar | docker build" with duplicate files inside #2734

Open kkujawinski opened 7 years ago

kkujawinski commented 7 years ago

Docker build calculates same cache checksums for different tar contexts.

Test case

Files structure:

- ctx/
   \- test.txt with default value
- ctx1/
   \- test.txt values for ctx1
- ctx/
   \- test.txt values for ctx2

Preparing ctx1.tar, building and running:

tar -cf ctx1.tar ./ctx -P --xform "s%/ctx%%"
tar -rf ctx1.tar ./ctx1 -P --xform "s%/ctx1%%"
cat ctx1.tar | docker build --tag=tar-duplicate-ctx -
docker run --rm tar-duplicate-ctx

The same for ctx2.tar

tar -cf ctx2.tar ./ctx -P --xform "s%/ctx%%"
tar -rf ctx2.tar ./ctx2 -P --xform "s%/ctx2%%"
cat ctx2.tar | docker build --tag=tar-duplicate-ctx -
docker run --rm tar-duplicate-ctx

Second build is calculating same cache checksum, despite different tars content.

I am attaching test case, which:

  1. generates ctx1.tar, list its contents and build image and run container
  2. generates ctx2.tar, list its contents and build image and run container
  3. removes image
  4. generates ctx2.tar, list its contents and build image and run container
  5. generates ctx1.tar, list its contents and build image and run container

You can see that:

Expected:

docker info:

Containers: 3
 Running: 1
 Paused: 0
 Stopped: 2
Images: 1296
Server Version: 17.06.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 1086
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 apparmor
Kernel Version: 4.4.0-53-generic
Operating System: Ubuntu 14.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 19.61GiB
Name: AAE-VirtualBox
ID: OJMM:FC7Q:2FMV:QGUE:JVUS:GIB2:IVDQ:QWLD:F6PR:ZG5I:VXTX:RKSJ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: http://defra1c-proxy.emea.nsn-net.net:8080/
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

docker version:

Client:
 Version:      17.06.0-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:19:16 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.0-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   02c1d87
 Built:        Fri Jun 23 21:17:13 2017
 OS/Arch:      linux/amd64
 Experimental: false

tar_duplicates.zip

kkujawinski commented 7 years ago

Cache checksum is calculated based on first file occurance, while in building last occurance is used.

thaJeztah commented 7 years ago

ping @tonistiigi @simonferquel PTAL

tonistiigi commented 7 years ago

I'm not sure we should support duplicate files, we do not support files like this for push/pull operations for example. It is fixed in https://github.com/moby/buildkit/pull/90 with a range of other similar issues. The old tarsum code does preserve any uniqueness for the paths, even if you sort the items for this specific case you will have the same problem when copying a directory containing duplicates.

ionelmc commented 7 years ago

Unfortunately the tar cli tools don't support file replacement, so when you update files in an archive they get appended to the end (and the consequence is that the archive will have duplicates). I don't like this either but there's no nice way around this, so Docker cli should support this sort of use (it's natural with tar, like it or not).