docker / buildx

Docker CLI plugin for extended build capabilities with BuildKit
Apache License 2.0
3.59k stars 483 forks source link

`ADD {url}` instruction incorrectly fetches different url from cache #2803

Open Kale-Ko opened 4 days ago

Kale-Ko commented 4 days ago

Contributing guidelines

I've found a bug and checked that ...

Description

After fetching the url https://api.adoptium.net/v3/assets/latest/8/hotspot?os=linux&architecture=x64&image_type=jdk using ADD in a Dockerfile, if in another Dockerfile https://api.adoptium.net/v3/assets/latest/21/hotspot?os=linux&architecture=x64&image_type=jdk is fetched it will return the data from the previous url instead of from this url.

Also see notes

Expected behaviour

The second ADD instruction properly fetches the second url.

Actual behaviour

The second ADD instruction incorrectly fetches the first url from cache.

Buildx version

github.com/docker/buildx v0.18.0 fa4461b9a1ec45c23d1b9e32dee0d0a8ed29900b, github.com/docker/buildx v0.17.1 257815a

Docker info

GitHub Actions:

Client: Docker Engine - Community
 Version:    26.1.3
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.18.0
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.27.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 0
 Server Version: 26.1.3
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7f7fdf5fed64eb6a7caf99b3e12efcf9d60e311c
 runc version: v1.1.14-0-g2c9f560
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.8.0-1017-azure
 Operating System: Ubuntu 24.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 2
 Total Memory: 7.753GiB
 Name: fv-az1975-703
 ID: 3faad159-9729-417d-aa25-0d36f85ea29c
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: githubactions
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

My Comp:

Client: Docker Engine - Community
 Version:    27.3.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.17.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.7
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 8
  Running: 0
  Paused: 0
  Stopped: 8
 Images: 286
 Server Version: 27.3.1
 Storage Driver: overlay2
  Backing Filesystem: btrfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 57f17b0a6295a39009d861b89e3b3b87b005ca27
 runc version: v1.1.14-0-g2c9f560
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.11.7-amd64
 Operating System: Debian GNU/Linux trixie/sid
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 14.96GiB
 Name: KalesCom
 ID: 6a28000b-310a-41bb-b0d6-62bfd58b5124
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: kaleko
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

Builders list

GitHub Actions:

NAME/NODE     DRIVER/ENDPOINT   STATUS    BUILDKIT   PLATFORMS
default*      docker                                 
 \_ default    \_ default       running   v0.13.2    linux/amd64 (+3), linux/386

My Comp:

NAME/NODE     DRIVER/ENDPOINT   STATUS    BUILDKIT   PLATFORMS
default*      docker                                 
 \_ default    \_ default       running   v0.16.0    linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/arm64, linux/riscv64, linux/ppc64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/loong64, linux/arm/v7, linux/arm/v6

Configuration

https://github.com/Kale-Ko/docker-bug-demo

Run script.sh

Build logs

https://github.com/Kale-Ko/docker-bug-demo/commit/de76b1db5c538dd81a050e1bad8ad5ae44b72657/checks/33289867223/logs

Additional info

No response

tonistiigi commented 2 days ago

It looks like there is a bug in both this specific server and also in the buildkit's caching logic.

curl -X GET "https://api.adoptium.net/v3/assets/latest/8/hotspot?os=linux&architecture=x64&image_type=jdk"  -H "Accept-Encoding: gzip" -v

< HTTP/2 200
< date: Sat, 23 Nov 2024 07:26:11 GMT
< content-type: application/json;charset=UTF-8
< content-length: 533
< strict-transport-security: max-age=63072000; includeSubDomains; preload
< x-content-type-options: nosniff
< x-frame-options: DENY
< cache-control: public, max-age=14400, no-transform, s-maxage=120
< etag: ce64aed9b43d2797c1129bfda324731c
< last-modified: Fri, 22 Nov 2024 15:30:28 GMT
curl -X GET "https://api.adoptium.net/v3/assets/latest/21/hotspot?os=linux&architecture=x64&image_type=jdk"  -H "Accept-Encoding: gzip" -v

< HTTP/2 200
< date: Sat, 23 Nov 2024 07:26:23 GMT
< content-type: application/json;charset=UTF-8
< content-length: 552
< strict-transport-security: max-age=63072000; includeSubDomains; preload
< x-content-type-options: nosniff
< x-frame-options: DENY
< cache-control: public, max-age=14400, no-transform, s-maxage=120
< etag: ce64aed9b43d2797c1129bfda324731c
< last-modified: Fri, 22 Nov 2024 15:30:28 GMT

Note that the Etag returned by the server is exactly the same. Etag should uniquely identify the resource. HEAD request also returns the same Etag.

From buildkit side the question is why would buildkit even compare the etag if the URL itself has a difference. Atm the full URL is not used, but only the filename component. I think this is not ideal and even if server works correctly it could just make the more potential cache lookups that are unlikely to match. Looking at this debug output, I think we could also compare content-length to detect more cases where servers are behaving incorrectly.

Kale-Ko commented 1 day ago

I checked out the source for api.adoptium.net and it is set to return the checksum of the entire "update" for every page that is cached, whatever that may be. Not sure if I should open an issue there or not.

tonistiigi commented 1 day ago

I checked out the source for api.adoptium.net and it is set to return the checksum of the entire "update" for every page that is cached, whatever that may be.

I'm not sure what "entire update" means in here, but different etag should be returned for different content.

Kale-Ko commented 1 day ago

I checked out the source for api.adoptium.net and it is set to return the checksum of the entire "update" for every page that is cached, whatever that may be.

I'm not sure what "entire update" means in here, but different etag should be returned for different content.

Me neither, it's just pulled from some database.