docker / buildx

Docker CLI plugin for extended build capabilities with BuildKit
Apache License 2.0
3.56k stars 481 forks source link

Caching Invalid Response - unable to decode token response #1906

Open Evesy opened 1 year ago

Evesy commented 1 year ago

Contributing guidelines

I've found a bug and checked that ...

Description

Occasionally seeing docker buildx builds failing with the below:

ERROR: failed to solve: failed to compute cache key: failed to copy: httpReadSeeker: failed open: failed to do request: Get "https://storage.googleapis.com/eu.artifacts.at-artefacts.appspot.com/containers/images/sha256:23ca387bd2625f78477b9702d1f31a0228271b522057415a9cc9cfd7274b8575?access_token=<REDACTED_EXPIRED_TOKEN>": failed to authorize redirect: failed to fetch anonymous token: unable to decode token response: invalid character '<' looking for beginning of value

After encountering this, all subsequent build attempts fail the same. Only on a Docker restart will things begin working again.

Expected behaviour

Actual behaviour

This happens intermittently, and I have yet to get an exact reproducible example. Below is chain of events that led to being able to reproduce the issue last time:

$ docker buildx bake --no-cache --sbom=false --provenance=false --builder gocd-multiarch  --set '*.tags=eu.gcr.io/at-artefacts/at-monkeynetes:latest' --set '*.tags=eu.gcr.io/at-artefacts/at-monkeynetes:2' --set '*.args.BUILDKIT_INLINE_CACHE=1' --set '*.platform=linux/arm64,linux/amd64' app

The above command had ran & built successfully. Subsequent builds were also cached and completed very quickly.

I then pruned docker, and the buildx builders:

$ docker system prune -a && docker buildx prune --all && docker buildx prune --builder gocd-multiarch --all

I reran the exact same command as before, but it failed with the below:

$ docker buildx bake --no-cache --sbom=false --provenance=false --builder gocd-multiarch  --set '*.tags=eu.gcr.io/at-artefacts/meves-test:latest' --set '*.args.BUILDKIT_INLINE_CACHE=1' --set '*.platform=linux/arm64,linux/amd64' app
[+] Building 10.2s (4/4) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                          0.0s
 => => transferring dockerfile: 915B                                                                                                                                                                          0.0s
 => [internal] load .dockerignore                                                                                                                                                                             0.0s
 => => transferring context: 346B                                                                                                                                                                             0.0s
 => ERROR [linux/amd64 internal] load metadata for eu.gcr.io/at-artefacts/platform-base-java-17:latest                                                                                                       10.1s
 => CANCELED [linux/arm64 internal] load metadata for eu.gcr.io/at-artefacts/platform-base-java-17:latest                                                                                                    10.1s
------
 > [linux/amd64 internal] load metadata for eu.gcr.io/at-artefacts/platform-base-java-17:latest:
------
WARNING: No output specified for app target(s) with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
Dockerfile:2
--------------------
   1 |     ARG GO_DEPENDENCY_LABEL_BASE_JAVA_17
   2 | >>> FROM eu.gcr.io/at-artefacts/platform-base-java-17:$GO_DEPENDENCY_LABEL_BASE_JAVA_17 as build
   3 |
   4 |     # Set an env-var that lets our settings.gradle file know that it's running on our CI server & so should publish to our shared build cache
--------------------
ERROR: failed to solve: eu.gcr.io/at-artefacts/platform-base-java-17:latest: failed to copy: httpReadSeeker: failed open: failed to do request: Get "https://storage.googleapis.com/eu.artifacts.at-artefacts.appspot.com/containers/images/sha256:652282eaaf2292dea2b2de83e487e8ee2e5ab476376f4b378446eb4d516ec062?access_token=ya29.c.b0Aaekm1Lw9nE5bMajNm9W58lA0578yPFIM9k7B-dako79LKshSW1KL9E7HzlenWmpGh7TKfURHYnqToZZrvCebV3B782OV30Gto2EhK4e1X3J-UPbkJ_yyCapyOg4OiKEby_tth3IWpWzGMF-YzLX6OYP44WjQfmKr74J5rNNyKmz4H8DsuNNZuww2VGXd3G5j3trdoIscZyuJh55tpMHwf1jEhIgKjNshBFz6T5_mewfa7VRUd_2dApYYqJnMzhPQaL8XskGnvpNlBftwJSetmPZ47KqemXdbCxlQEvWmxe2jvBjE2aZUYvFcKCZmrLnuR_YS4nUSZibX7A2HhftCLFZbuAzhk95SDeM6SGBSJ2GWXlWKNke4X71c9010EQpTNUpEa3ptGPBxmgpr2kGuPYXNPkzwRqGEESr_4bReRv9x9bFjSoiI-MfzNeO-Rstk7Zkg1SxEbDLw56gOekFZOJBMwyNRTAKYyCZLBz1lOhDnBQ3_834WC-Po0EGRgaztCzOntdwcu09kTZ93u670-cIVoe_3h5eiE360b3pHBpxe80xZlftm2he4Wi2wMAJJaiVagqqqz8UfxPoJQT603KvmRQwcVa8ojM4rZyav6QWWkvQX_xnq4omx5Xtjhnplsi-cdFBXVZp4U1bjw9BljwO5X05_8W4w2j3YjQXjrlWpprWb2buvfuV4_SIJV7o8Qq6Sx4fs4vXyzg1enqS38QaMiljgeIzhgVVld1wOe_vs0XMrpQwt0B31jfw2tglzU4vI-va7B5ndRvOrp1bgz2uBa2nvSpbUJIIdW4lvyXrcuby8Zn0V_5qmBnYm_R2FnzoJhwBSM46W5UYcUmrszFhrbVwgI6Xf_4lrm1B-Xhn1t_p-Ibd2tl39xVXcI4ujg-B2zyoSvROt6ZM01kWWVwuVu6qIffhdwnOZ0et919BBcxhlbkaB8nwF8MkuqkB3IvwBBo22j1BUqWugbx-1M57kU_dVb0Ww2Ri14ZrpbWyBe1I_hf35IZ": failed to authorize redirect: failed to fetch anonymous token: unable to decode token response: invalid character '<' looking for beginning of value

Interesting to note, is the build above was started at 10:41:44. The token being used in the above failed request was one that was returned by the credential helper for eu.gcr.io, at 10:25:03 (Even though I can see the cred helper being called and returning a different token at 10:41:46 after the build was started

So it seems somewhere Docker is caching layers with access tokens that may have since expired (since our access tokens only have between 5-60 minutes TTL).

This then affects any Docker builds using that same FROM image:

$ docker buildx bake --no-cache --sbom=false --provenance=false --builder gocd-multiarch  --set '*.tags=eu.gcr.io/at-artefacts/meves-test:latest' --set '*.args.BUILDKIT_INLINE_CACHE=1' --set '*.platform=linux/arm64,linux/amd64' app --push
[+] Building 6.2s (4/4) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                                                          0.0s
 => => transferring dockerfile: 365B                                                                                                                                                                          0.0s
 => [internal] load .dockerignore                                                                                                                                                                             0.0s
 => => transferring context: 2B                                                                                                                                                                               0.0s
 => CANCELED [linux/amd64 internal] load metadata for eu.gcr.io/at-artefacts/platform-base-java-17:latest                                                                                                     6.2s
 => ERROR [linux/arm64 internal] load metadata for eu.gcr.io/at-artefacts/platform-base-java-17:latest                                                                                                        6.2s
------
 > [linux/arm64 internal] load metadata for eu.gcr.io/at-artefacts/platform-base-java-17:latest:
------
Dockerfile:2
--------------------
   1 |     ARG GO_DEPENDENCY_LABEL_BASE_JAVA_17
   2 | >>> FROM eu.gcr.io/at-artefacts/platform-base-java-17:$GO_DEPENDENCY_LABEL_BASE_JAVA_17 as build
   3 |
   4 |     RUN sleep 120
--------------------
ERROR: failed to solve: eu.gcr.io/at-artefacts/platform-base-java-17:latest: failed to copy: httpReadSeeker: failed open: failed to do request: Get "https://storage.googleapis.com/eu.artifacts.at-artefacts.appspot.com/containers/images/sha256:1a1c20c1e81f78d16a2266be8e91b7596dfd416c1a4019808353a035da5bd396?access_token=ya29.c.b0Aaekm1Lw9nE5bMajNm9W58lA0578yPFIM9k7B-dako79LKshSW1KL9E7HzlenWmpGh7TKfURHYnqToZZrvCebV3B782OV30Gto2EhK4e1X3J-UPbkJ_yyCapyOg4OiKEby_tth3IWpWzGMF-YzLX6OYP44WjQfmKr74J5rNNyKmz4H8DsuNNZuww2VGXd3G5j3trdoIscZyuJh55tpMHwf1jEhIgKjNshBFz6T5_mewfa7VRUd_2dApYYqJnMzhPQaL8XskGnvpNlBftwJSetmPZ47KqemXdbCxlQEvWmxe2jvBjE2aZUYvFcKCZmrLnuR_YS4nUSZibX7A2HhftCLFZbuAzhk95SDeM6SGBSJ2GWXlWKNke4X71c9010EQpTNUpEa3ptGPBxmgpr2kGuPYXNPkzwRqGEESr_4bReRv9x9bFjSoiI-MfzNeO-Rstk7Zkg1SxEbDLw56gOekFZOJBMwyNRTAKYyCZLBz1lOhDnBQ3_834WC-Po0EGRgaztCzOntdwcu09kTZ93u670-cIVoe_3h5eiE360b3pHBpxe80xZlftm2he4Wi2wMAJJaiVagqqqz8UfxPoJQT603KvmRQwcVa8ojM4rZyav6QWWkvQX_xnq4omx5Xtjhnplsi-cdFBXVZp4U1bjw9BljwO5X05_8W4w2j3YjQXjrlWpprWb2buvfuV4_SIJV7o8Qq6Sx4fs4vXyzg1enqS38QaMiljgeIzhgVVld1wOe_vs0XMrpQwt0B31jfw2tglzU4vI-va7B5ndRvOrp1bgz2uBa2nvSpbUJIIdW4lvyXrcuby8Zn0V_5qmBnYm_R2FnzoJhwBSM46W5UYcUmrszFhrbVwgI6Xf_4lrm1B-Xhn1t_p-Ibd2tl39xVXcI4ujg-B2zyoSvROt6ZM01kWWVwuVu6qIffhdwnOZ0et919BBcxhlbkaB8nwF8MkuqkB3IvwBBo22j1BUqWugbx-1M57kU_dVb0Ww2Ri14ZrpbWyBe1I_hf35IZ": failed to authorize redirect: failed to fetch anonymous token: unable to decode token response: invalid character '<' looking for beginning of value

Worth noting it's not always on the FROM stage that it fails, another example failed during a COPY directive:

....
....
 => [linux/amd64 build 6/7] COPY --chown=atcloud:atcloud . .                                                                                                                                                  0.0s
 => CANCELED [linux/amd64 build 7/7] RUN ./gradlew build --info                                                                                                                                              76.2s
 => ERROR [linux/arm64 build 2/7] COPY --chown=atcloud:atcloud ./gradle/wrapper ./gradle/wrapper                                                                                                              0.0s
 => ERROR [linux/arm64 stage-1 2/4] RUN dnf -y -q install postgresql                                                                                                                                          0.0s
------
 > [linux/arm64 build 2/7] COPY --chown=atcloud:atcloud ./gradle/wrapper ./gradle/wrapper:
------
------
 > [linux/arm64 stage-1 2/4] RUN dnf -y -q install postgresql:
------
WARNING: No output specified for app target(s) with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
Dockerfile:8
--------------------
   6 |     ENV GO_PIPELINE_NAME $GO_PIPELINE_NAME
   7 |
   8 | >>> COPY --chown=atcloud:atcloud ./gradle/wrapper ./gradle/wrapper
   9 |     COPY --chown=atcloud:atcloud ./gradlew .
  10 |     COPY --chown=atcloud:atcloud ./settings.gradle.kts .
--------------------
ERROR: failed to solve: failed to compute cache key: failed to copy: httpReadSeeker: failed open: failed to do request: Get "https://storage.googleapis.com/eu.artifacts.at-artefacts.appspot.com/containers/images/sha256:7fd3376a4f9bf296614cc71d95fe4bf979033d98d38f235748226a9aaa493b6b?access_token=ya29.c.b0Aaekm1JyCImyb_okr4Pg8nKqRmKbZEmQTT4O8KHt4J66gLBgNPw7DFJEId-yV-2Wf06MQVchO8ASVXERD78JUStc5lKSe26hWp0Y08bgAT4wNPjB6Br2qjYyHnWCWVSuwJbLk99RACRSScFutvBJUPL_bZuRfIoU-1Jxyxq785GfkDNzjo3lMMSFsMHEKvELbD7X3Qa92Izr-dAG9Z1H4SLG4dZQ83JnnCdhZ_PwH5LjsHD2yzpVEIlpB5a2xWsV0YMaGGyZXncNUDk80JfVszSS1TvJcLZIbJtweIA-CvYC3AXSje5Dx3Rtzk6jY1q3WDhCCrzrwqUG1isA9JSfBcYWOHtu1Fy2pOHiSUgM3os1Ps-cqaWNSDKBgsZS218_rPgGsJz_rrKAz7PR_cgNkjumg8x0wYldN1uvhS6ke-NeCpMEV6hRaa00wcEYE8eQ0sBUXiL-hkNJDmJN1-sDU0iGz78OvNCFgoANjhx-stCoM3HMYuhcN8CM8EDAVTslf8D8wo8nSVtIMtoBNc43qMjyAxyzYXS5Qfq64asP0U2w9669-BFjEkuf9_v_JmXbb8filH5_NKRlbthePcwZH605KYljJl2MRBXJumcznmhXnmad36g3-xk8t1p2v8R_4eoonvWrmfyoo2xWrM6hfkccQQFQ_agi2ZlgJckyB_hZ6ZewXcRkROZbx4q2nFmoSr2jcqtkBzmSJOuhpRFqjVvdMWQ-_6Qmx8deMr-g3WOVqkR-9qYXovshjyS3_cIik-a8_-1-S_0Wgyr-S8wpJadSqF5-VF_lh2Ihv8Wd_e-ihvXaep7wqemOY7yfro8e768J-pJw9xfVZtR1dWqjZ4vp0wSd1cJWkIp_ib-3gqSzMJYRQdmqoq58Su49_XUOJaXm3b88gVyaR82y8WvuYeMUnt-VttQ8e65ReIcYj80ic9SmtqluwrQXXx_06mqkcf5xFxr0Yuzin8tqXcIs0lytd6o71t8zRQ2jXMYVt-Iz6Zjue5t-2VY": failed to authorize redirect: failed to fetch anonymous token: unable to decode token response: invalid character '<' looking for beginning of value

Buildx version

github.com/docker/buildx v0.10.5 86bdced7766639d56baa4c7c449a4f6468490f87

Docker info

Local machine:

Client:
 Version:    24.0.2
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.5
    Path:     /Users/Michael.Eves/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.18.1
    Path:     /Users/Michael.Eves/.docker/cli-plugins/docker-compose
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.0
    Path:     /Users/Michael.Eves/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.19
    Path:     /Users/Michael.Eves/.docker/cli-plugins/docker-extension
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v0.1.0-beta.4
    Path:     /Users/Michael.Eves/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/Michael.Eves/.docker/cli-plugins/docker-sbom
  scan: Docker Scan (Docker Inc.)
    Version:  v0.26.0
    Path:     /Users/Michael.Eves/.docker/cli-plugins/docker-scan
  scout: Command line tool for Docker Scout (Docker Inc.)
    Version:  v0.12.0
    Path:     /Users/Michael.Eves/.docker/cli-plugins/docker-scout

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 1
 Server Version: 24.0.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.15.49-linuxkit-pr
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 5
 Total Memory: 7.765GiB
 Name: docker-desktop
 ID: 10976d1e-fdf6-44bb-aec5-8e7376f9a561
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

CI:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc., v0.10.5)
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc., 0.6.1)

Server:
 Containers: 2
  Running: 1
  Paused: 0
  Stopped: 1
 Images: 25
 Server Version: 24.0.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 1677a17964311325ed1c31e2c0a3589ce6d5c30d
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 5.15.89+
 Operating System: Alpine Linux v3.18 (containerized)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 13.17GiB
 Name: gocd-app-agents-57b69695f5-dzlhf
 ID: c9b3d19e-d418-48b9-888f-059d9501e619
 Docker Root Dir: /storage/docker/gocd-app-agents-57b69695f5-dzlhf
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine
 Default Address Pools:
   Base: 10.207.212.0/22, Size: 24

Builders list

NAME/NODE          DRIVER/ENDPOINT      STATUS   BUILDKIT                              PLATFORMS
gocd-multiarch     docker-container
  gocd-multiarch0  tcp://127.0.0.1:2375 running  v0.11.6                               linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/amd64/v4, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/386, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6
single-threaded    docker-container
  single-threaded0 tcp://127.0.0.1:2375 inactive
default *          docker
  default          default              running  v0.11.7-0.20230525183624-798ad6b0ce9f linux/amd64, linux/amd64/v2, linux/amd64/v3, linux/amd64/v4, linux/386, linux/arm64, linux/riscv64, linux/ppc64le, linux/s390x, linux/mips64le, linux/mips64, linux/arm/v7, linux/arm/v6

Configuration

ARG GO_DEPENDENCY_LABEL_BASE_JAVA_17
FROM eu.gcr.io/at-artefacts/platform-base-java-17:$GO_DEPENDENCY_LABEL_BASE_JAVA_17 as build

# Set an env-var that lets our settings.gradle file know that it's running on our CI server & so should publish to our shared build cache
ARG GO_PIPELINE_NAME
ENV GO_PIPELINE_NAME $GO_PIPELINE_NAME

COPY --chown=atcloud:atcloud ./gradle/wrapper ./gradle/wrapper
COPY --chown=atcloud:atcloud ./gradlew .
COPY --chown=atcloud:atcloud ./settings.gradle.kts .

RUN ./gradlew

COPY --chown=atcloud:atcloud . .

RUN ./gradlew build --info

FROM eu.gcr.io/at-artefacts/platform-base-java-17:$GO_DEPENDENCY_LABEL_BASE_JAVA_17

USER root

RUN dnf -y -q install postgresql

USER atcloud

COPY --from=build /usr/local/autotrader/app/build/libs/monkeynetes-0.0.1-SNAPSHOT.jar /usr/local/autotrader/app/app.jar

COPY echo-status.sh /usr/local/echo-status.sh
version: "3.8"
services:
  app:
    image: "eu.gcr.io/at-artefacts/at-monkeynetes"
    network_mode: bridge
    build:
      context: .
      args:
        GO_DEPENDENCY_LABEL_BASE_JAVA_17: "${GO_DEPENDENCY_LABEL_BASE_JAVA_17:-latest}"
    ports:
      - "8080:8080"
      - "9090:9090"
    environment:
      ENABLE_GC_LOGS: "true"
      JVM_MAX_HEAP_MB: 200
      ENABLE_JFR: "true"
    mem_limit: 512m
docker buildx bake --no-cache --sbom=false --provenance=false --builder gocd-multiarch  --set '*.tags=eu.gcr.io/at-artefacts/at-monkeynetes:latest' --set '*.tags=eu.gcr.io/at-artefacts/at-monkeynetes:2' --set '*.args.BUILDKIT_INLINE_CACHE=1' --set '*.platform=linux/arm64,linux/amd64' app

Build logs

See above

Additional info

I think we've only observed this for multiarch builds, however that could also just be down to the fact those builds generally take a lot longer than singlearch builds

crazy-max commented 1 year ago

What your bake definition looks like? Are you exporting cache?

Evesy commented 1 year ago

Sorry for not replying, must have missed the response. The bake definition is using docker-compose.yml which looks like the below:

version: "3.8"
services:
  app:
    image: "eu.gcr.io/at-artefacts/at-monkeynetes"
    network_mode: bridge
    build:
      context: .
      args:
        GO_DEPENDENCY_LABEL_BASE_JAVA_17: "${GO_DEPENDENCY_LABEL_BASE_JAVA_17:-latest}"
    ports:
      - "8080:8080"
      - "9090:9090"
    environment:
      ENABLE_GC_LOGS: "true"
      JVM_MAX_HEAP_MB: 200
      ENABLE_JFR: "true"
    mem_limit: 512m

I had observed it using no cache export, and also when using inline cache export. I can't say I've ran into the issue recently so perhaps it was inadvertently fixed in an upgrade somewhere along the way. Given no one else seems to have observed the same I'd be happy for this to be closed off unless you want to keep it open?