Closed lifeofguenter closed 8 months ago
BuildKit 0.8 will default to using an OCI media type for its caches (see https://github.com/moby/buildkit/pull/1746) which I assume should make this work, but I haven't tested it myself.
It still doesn't work with recently released buildkit 0.8.0 It can write the layers and config, but it is unable to upload manifest to ECR:
=> ERROR exporting cache 5.4s
=> => preparing build cache for export 0.2s
=> => writing layer sha256:0d48cc65d93fe2ee9877959ff98ebc98b95fe4b2fc467ff50f27103c1c5d6973 0.3s
=> => writing layer sha256:2ade286d53f2e045413601ca0e3790de3792ea34abd3d025cd2cd9c3cb5231de 0.3s
=> => writing layer sha256:64befcf53942ba04c144cde468548885d497e238001e965e983e39eb947860c2 0.3s
=> => writing layer sha256:7415f0cbea8739c1bf353568b16ac74a9cfbc0b36327602e3a025abf919a38a6 0.3s
=> => writing layer sha256:76a1f73c618c30eb1b1d90cf043fe3f855a1cce922d1fb47458defd3dbe1c783 0.3s
=> => writing layer sha256:8674739c0ada3e834b816667d26dd185aa5ea089f33701f11a05b7be03f43026 0.3s
=> => writing layer sha256:9dc80bcd2805b2a441bd69bc9468df2e81994239e34879567bed7bdef6cb605d 0.3s
=> => writing layer sha256:cbdbe7a5bc2a134ca8ec91be58565ec07d037386d1f1d8385412d224deafca08 0.3s
=> => writing layer sha256:ce4e6de84945ab498f65d16920c9b801dfea3792871e44f89e6438e232a690b3 0.3s
=> => writing layer sha256:d46583c5d4c69b34cb46866838d68f53a38686dc7f2d1347ae0f252e8eb0ed4c 0.2s
=> => writing config sha256:33c76a0f8a74a06e461926d8a8d1845371c0cf9e86753db2483a4873aede8889 2.0s
=> => writing manifest sha256:0f69a7e6626f6a24a0a95ed915613ebdf9459280d4986879480d87e34849aea8 0.6s
------
> importing cache manifest from XXXXXXXXXXXX.dkr.ecr.us-west-2.amazonaws.com/test-repo:buildcache:
------
------
> exporting cache:
------
error: failed to solve: rpc error: code = Unknown desc = error writing manifest blob: failed commit on ref "sha256:0f69a7e6626f6a24a0a95ed915613ebdf9459280d4986879480d87e34849aea8": unexpected status: 400 Bad Request
I am seeing the same error on buildkit 0.8.0
even when setting oci-mediatypes explicitly to true: --export-cache type=registry,ref=${REPO}:buildcache,oci-mediatypes=true
=> ERROR exporting cache 1.4s
=> => preparing build cache for export 0.0s
=> => writing layer sha256:757d39990544d20fbebf7a88e29a5dd2bb6a4fdb116d67df9fe8056843da794d 0.1s
=> => writing layer sha256:7597eaba0060104f2bd4f3c46f0050fcf6df83066870767af41c2d7696bb33b2 0.1s
=> => writing config sha256:0e308fd4eee4cae672eee133cbd77ef7c197fa5d587110b59350a99b289f7000 0.8s
=> => writing manifest sha256:8eb142b16e0ec25db4517f2aecff795cca2b1adbe07c32f5c571efc5c808cbcd 0.3s
------
> importing cache manifest from xxx.dkr.ecr.us-east-1.amazonaws.com/errm/test:buildcache:
------
------
> exporting cache:
------
error: failed to solve: rpc error: code = Unknown desc = error writing manifest blob: failed commit on ref "sha256:8eb142b16e0ec25db4517f2aecff795cca2b1adbe07c32f5c571efc5c808cbcd": unexpected status: 400 Bad Request
Deamon logs:
time="2020-12-09T13:42:48Z" level=info msg="running server on /run/buildkit/buildkitd.sock"
time="2020-12-09T13:44:09Z" level=warning msg="reference for unknown type: application/vnd.buildkit.cacheconfig.v0"
time="2020-12-09T13:44:10Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Unknown desc = error writing manifest blob: failed commit on ref \"sha256:8eb142b16e0ec25db4517f2aecff795cca2b1adbe07c32f5c571efc5c808cbcd\": unexpected status: 400 Bad Request\n"
Also seeing this for private repos, although it doesn't seem to be an issue with public ECR repos..
Is there a timeframe for this feature request available? This could help to tremendously speed up CI builds.
We have been experimenting with this buildkit feature for some time now and it works wonders. currently, we are still dependant upon dockehub so having this functionality in private ecr would greatly benefit our ci/cd workflow
Any indication as to if/when this will ever be available? Using buildkit would really improve our CI build times
One year passed and still nothing 😔
We'd really like to see support of this with ECR private repos 🙏
As of today, it still does not work:
error: failed to solve: rpc error: code = Unknown desc = error writing manifest blob: failed commit on ref "sha256:75f32e1bb4df7c6333dc352ea3ea9d04d1e04e4a14ba79b59daa019074166519": unexpected status: 400 Bad Request
Yes please!
Can we get any kind of communications on this?
Is there any workaround available?
For the teams using Github but wishing to keep images in ECR, it is possible to leverage the cache manifest support from Github Container Registry (GHCR) and push the image to ECR at the same time. When pushing to ECR, only new layers get pushed.
Github Actions workflow example:
jobs:
docker_build:
strategy:
matrix:
name:
- my-image
include:
- name: my-image
registry_ecr: my-aws-account-id.dkr.ecr.us-east-1.amazonaws.com
registry_ghcr: ghcr.io/my-github-org-name
dockerfile: ./path/to/Dockerfile
context: .
extra_args: ''
steps:
- uses: actions/checkout@v2
- name: Install Buildkit
uses: docker/setup-buildx-action@v1
id: buildx
with:
install: true
- name: Login to GitHub Container Registry
uses: docker/login-action@v1
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
role-skip-session-tagging: true
role-duration-seconds: 1800
role-session-name: GithubActionsBuildDockerImages
- name: Login to Amazon ECR
uses: aws-actions/amazon-ecr-login@v1
- name: Build & Push (ECR)
# - https://docs.docker.com/engine/reference/commandline/buildx_build/
# - https://github.com/moby/buildkit#export-cache
run: |
docker buildx build \
--cache-from=type=registry,ref=${{ matrix.registry_ghcr }}/${{ matrix.name }}:cache \
--cache-to=type=registry,ref=${{ matrix.registry_ghcr }}/${{ matrix.name }}:cache,mode=max \
--push \
${{ matrix.extra_args }} \
-f ${{ matrix.dockerfile }} \
-t ${{ matrix.registry_ecr }}/${{ matrix.name }}:${{ github.sha }} \
${{ matrix.context }}
@ynouri Just be careful of your storage costs in GHCR. It's oddly expensive. I found https://github.com/snok/container-retention-policy to help solve that use case for me.
ECR still not supporting this is unbelievably amateurish, it doesn't suit AWS...
Is there any workaround available?
Use another Docker registry. Dockerhub, or perhaps your own tiny EC2 with some fat storage. Sucks, but AWS doesn't seem interested.
This seems to have started working unannounced, at least when using docker 20.10.11 to build
This seems to have started working unannounced, at least when using docker 20.10.11 to build
I'm still seeing error writing manifest blob
with 400 Bad Request
on Docker 5:20.10.12~3-0~ubuntu-focal
, at least in us-west-2.
This seems to have started working unannounced, at least when using docker 20.10.11 to build
is this confirmed?
This seems to have started working unannounced, at least when using docker 20.10.11 to build
is this confirmed?
I'm wondering the same thing.
Could you share some more info @poldridge ?
I've just faced the same issue with Docker version 20.10.12, build e91ed57. Would appreciate any hints or workarounds.
This seems to have started working unannounced, at least when using docker 20.10.11 to build
Did not work for me using docker:20.10.11-dind
and ECR us-west-2.
Can we get any kind of communication on this? Being able to use remote cache is a major benefit to all our build pipelines.
I am also super intrigued on a field report of what progress has occurred and what supported aspect of OCI layer caching are supported in ECR right now
Do we have any update on this? Can we get any kind of response from AWS?
Would also like to know when this will be available
Just stumbled on this during our migration to AWS. This kind of sucks as it break our pipeline logic ... Would be interested also for an ETA for this feature
We gave up a while back and threw up our own (ALB + EC2 + S3) registry - setup was pretty quick. We finally got around to trying it and it appears to work great. We're still storing/pulling images in ECR for ECS; we only use our registry for the cache.
https://docs.docker.com/registry/deploying/
Still need to look into automatically cleaning up old images...
Waiting on this very important feature request. This is blocking our migration to Graviton instances as multi-arch build caching not working without this one and causing the build to take too much time for completion.
We have started looking into the technical approaches & feasibility to support this.
@arunsollet would you have an ETA on this?
just in case
buildkit's inline cache works as expected on AWS ECR.
And yes, if you want to use layer caching for multi-stage builds, you need separate registry cache which doesn't work on AWS ECR.
@rstanevich Inline Cache has been very inconsistent with us, especially with multistage images, it gets cache misses when it shouldn't be.
@rstanevich Inline Cache has been very inconsistent with us, especially with multistage images, it gets cache misses when it shouldn't be.
Yep, inline cache is not designed for multistage builds since it stores only layers from the final stage. In other cases I found inline to be a better solution.
For multistage build cache we still can use sonatype nexus, artifactory, dockerhub, etc, and setting up a strict retention policy there.
@rstanevich Inline Cache has been very inconsistent with us, especially with multistage images, it gets cache misses when it shouldn't be.
Yep, inline cache is not designed for multistage builds since it stores only layers from the final stage. In other cases I found inline to be a better solution.
For multistage build cache we still can use sonatype nexus, artifactory, dockerhub, etc, and setting up a strict retention policy there.
Let's please stop talking about alternatives, we need AWS ECR to support multistage layer cache, don't care if dockerhub does it.
We have started looking into the technical approaches & feasibility to support this.
Hi @arunsollet any progress regarding this issue?
We continue to investigate and will provide an update as soon as more information is available.
until AWS fixes this, I found this solution - there's quite a lot of subtlety that took me a while to figure out, but now it works!
this is specifically for (1) caching multi-stage builds (2) on fresh instances (with no instance-local cache)
basically, do this:
export DOCKER_BUILDKIT=1
docker pull 000000000000.dkr.ecr.us-east-1.amazonaws.com/my-project:build || true
docker build . -f docker/Dockerfile --target build --build-arg BUILDKIT_INLINE_CACHE=1 \
-t 000000000000.dkr.ecr.us-east-1.amazonaws.com/my-project:build \
--cache-from 000000000000.dkr.ecr.us-east-1.amazonaws.com/my-project:build
docker pull 000000000000.dkr.ecr.us-east-1.amazonaws.com/my-project:latest || true
docker build . -f docker/Dockerfile --build-arg BUILDKIT_INLINE_CACHE=1 \
-t 000000000000.dkr.ecr.us-east-1.amazonaws.com/my-project:latest \
--cache-from 000000000000.dkr.ecr.us-east-1.amazonaws.com/my-project:build \
--cache-from 000000000000.dkr.ecr.us-east-1.amazonaws.com/my-project:latest
docker push 000000000000.dkr.ecr.us-east-1.amazonaws.com/my-project:build
docker push 000000000000.dkr.ecr.us-east-1.amazonaws.com/my-project:latest
if anyone has issues, feel free to contact me!
Hi everyone! I'm Rafa, a new PM in the team, and I'm taking over investigating this issue. Would love to chat with some of the people in here that have expressed an interest in it - we want to make sure that this is well built in a way that is useful and doesn't introduce any problems or issues.
I'm getting up to speed, so some education would definitely help. I know every has been quite patient, so I apologize in advance for asking for more.
You can DM me at twitter.com/rafavallina or also reach out to your account team. If that doesn't work, please leave a comment here with your preferred way to get in touch and we'll find time!
Hi everyone! I'm Rafa, a new PM in the team, and I'm taking over investigating this issue. Would love to chat with some of the people in here that have expressed an interest in it - we want to make sure that this is well built in a way that is useful and doesn't introduce any problems or issues.
I'm getting up to speed, so some education would definitely help. I know every has been quite patient, so I apologize in advance for asking for more.
You can DM me at twitter.com/rafavallina or also reach out to your account team. If that doesn't work, please leave a comment here with your preferred way to get in touch and we'll find time!
Use case is: GitHub Actions Docker layer caching sucks and this feature would be a better alternative
A fair number of "hosted CI" platforms (similar to gha) have this problem with utilizing docker layer caching... most are programmed with extremely ephemeral runtimes (which is a good thing).... except, it would be really advantageous to have the docker layer caching be supported at the registry/repo layer because it is most durable there.
As others have mentioned, we'd like to use the feature to improve caching of builds in GitLab CI pipelines. This particular feature is required to be able to cache the intermediary stages in multi-stage docker files. Without there is no way to cache those layers, even when nothing has changed. The ability to do this would significantly speed up the build time for a few applications in our stack. Saving 5 mins on every build pipeline really adds up after a while.
A not-ideal-but-workable workaround for GitHub Actions at the moment is to use docker/build-push-action with the buildkit engine and utilise the experimental s3 cache on buildkit's master branch, like so (extracted from a private action, so apologies for all the variables, but hopefully still gives an idea of how to implement it):
runs:
using: composite
steps:
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v2
with:
# We're using an experimental version of buildx for s3 support
driver-opts: image=moby/buildkit:master
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-region: eu-west-2
aws-access-key-id: ${{ inputs.AWS_ECR_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ inputs.AWS_ECR_SECRET_ACCESS_KEY }}
role-session-name: "${{ github.event.repository.name }}-GithubActions-build.yml"
- name: Login to AWS ECR
if: success()
uses: aws-actions/amazon-ecr-login@v1
- name: Docker metadata (${{ inputs.AWS_ECR_REPOSITORY }})
if: success()
id: meta-conf
uses: docker/metadata-action@v4.0.1
with:
images: ${{ inputs.AWS_ECR_REPOSITORY }}
tags: |
# Rather than using type=schedule which only creates a tag on a schedule, we want date tags on every build
type=raw,value={{date 'YYYYMMDD'}}${{ inputs.TAG_SUFFIX }}
# refs
type=ref,event=branch,suffix=${{ inputs.TAG_SUFFIX }}
type=ref,event=tag,suffix=${{ inputs.TAG_SUFFIX }}
type=ref,event=pr,suffix=${{ inputs.TAG_SUFFIX }}
# sha's
type=sha,prefix=,suffix=${{ inputs.TAG_SUFFIX }}
type=sha,prefix=,format=long,suffix=${{ inputs.TAG_SUFFIX }}
# set latest tag for master branch
type=raw,value=latest,enable=${{ github.ref_name == 'master' }},suffix=${{ inputs.TAG_SUFFIX }}
- name: Push Container (${{ inputs.AWS_ECR_REPOSITORY }})
if: success()
id: build
uses: docker/build-push-action@v3.1.1
with:
file: ${{ steps.dockerfile.outputs.result }}
context: ${{ steps.build-context-dirname.outputs.value }}
tags: ${{ steps.meta-conf.outputs.tags }}
labels: ${{ steps.meta-conf.outputs.labels }}
platforms: ${{ inputs.PLATFORMS }}
builder: ${{ steps.buildx.outputs.name }}
push: true
pull: true
# ECR _still_ doesn't support inline-cache manifests, so we use the experimental s3 cache instead
# https://github.com/aws/containers-roadmap/issues/876
# https://github.com/moby/buildkit#s3-cache-experimental
# Use a separate scope per dockerfile to avoid conflicts
cache-from: type=s3,region=eu-west-2,bucket=<BUCKET NAME>,prefix=gha_docker_build_cache/${{ github.event.repository.name }}/${{ steps.dockerfile_underscored.outputs.value }}
cache-to: type=s3,region=eu-west-2,bucket=<BUCKET NAME>,prefix=gha_docker_build_cache/${{ github.event.repository.name }}/${{ steps.dockerfile_underscored.outputs.value }},mode=max
build-args: ${{ inputs.BUILD_ARGS }}
You can do it without docker/build-push-action and just call the build via cli using the correct cache-from
and cache-to
args as well, but I don't have an example of that handy.
Yes, @Makeshift, experimental remote S3 cache seems to be the best option for AWS now. I see one point that needs to be improved - currently, we must push the same blobs from the final image to both ECR and S3. When ECR supports the cache manifest, I expect cache and image pushing to be faster.
Thanks so much to everyone that provided more into here and in Twitter.
We are investigating the best way to proceed. Buildkit is relying on a relatively unorthodox use of the OCI specification to enable this feature (see here), and we want to make sure we do our best to continue adhering to the standard while we support customers. OCI Artifacts (see here) seem to be there more adequate tool for the cache manifest. I've asked for more information on the Buildkit repo to see the progress being made there, and we are also looking at the effort on our side.
Is anyone in this thread using or intending to use other tools than Docker buildx
to build their images using cache manifest? Want to make sure we are thorough as possible and support customers across the board.
Would also love to hear how people are working around this limitation today, as it provides additional context about the underlying customer need
Would also love to hear how people are working around this limitation today, as it provides additional context about the underlying customer need
In our case, before Buildx had support for s3 caches - since we host an s3-backed registry as a pull-through cache for our ECS clusters/build workers anyway(to cache base images pulled from docker hub, as they implemented pull restrictions), it was pretty easy to configure that as a build cache store as well. I can understand why people are frustrated at having to host an additional service as well though, as for some it can mean quite a lot of additional infrastructure to support it, if they're not able to just tack it on to their existing setup.
Thank you for the links you provided, by the way. It really helped my understanding of how ECS (and apparently most suppliers who follow the spec!) came to not support caches produced by buildx.
FWIW, my team uses buildctl directly, instead of the buildx wrapper.
On Sat, Oct 22, 2022 at 10:45 Connor Bell @.***> wrote:
Would also love to hear how people are working around this limitation today, as it provides additional context about the underlying customer need
In our case, before Buildx had support for s3 caches - since we host an s3-backed registry as a pull-through cache https://docs.docker.com/registry/recipes/mirror/ for our ECS clusters/build workers anyway(to cache base images pulled from docker hub, as they implemented pull restrictions), it was pretty easy to configure that as a build cache store as well. I can understand why people are frustrated at having to host an additional service as well though, as for some it can mean quite a lot of additional infrastructure to support it, if they're not able to just tack it on to their existing setup.
Thank you for the links you provided, by the way. It really helped my understanding of how ECS (and apparently most suppliers who follow the spec!) came to not support caches produced by buildx.
— Reply to this email directly, view it on GitHub https://github.com/aws/containers-roadmap/issues/876#issuecomment-1287798023, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJZKI2TOTMYSY4BPWEBEHTWEPVYTANCNFSM4MZQVARQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Thiago Augusto V. Lima
FWIW, my team uses buildctl directly, instead of the buildx wrapper.
Do you have an example of that?
For those looking for guidance on the S3 build cache with buildx this worked for me..
- uses: actions/checkout@v3
- name: configure aws credentials
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Buildx master
run: docker buildx create --bootstrap --driver docker-container --driver-opt image=moby/buildkit:master --use
and
docker buildx build . --cache-from=type=s3,region=eu-west-1,bucket=bucket-name,name=docker-cache/myapp,access_key_id=$(AWS_ACCESS_KEY_ID),secret_access_key=$(AWS_SECRET_ACCESS_KEY),session_token=$(AWS_SESSION_TOKEN) --cache-to=type=s3,region=eu-west-1,bucket=bucket-name,name=docker-cache/myapp,access_key_id=$(AWS_ACCESS_KEY_ID),secret_access_key=$(AWS_SECRET_ACCESS_KEY),session_token=$(AWS_SESSION_TOKEN)
Will probably wait until S3 cache is released before using in production.
Would be great if ECR could support cache-manifest (see: https://medium.com/titansoft-engineering/docker-build-cache-sharing-on-multi-hosts-with-buildkit-and-buildx-eb8f7005918e)
NOTE FROM AWS: We shipped this on BuildKit 0.12, see here for details - https://aws.amazon.com/blogs/containers/announcing-remote-cache-support-in-amazon-ecr-for-buildkit-clients/. We are keeping this issue open for the time being to allow the community to discuss and gather further feedback