estesp / manifest-tool

Command line tool to create and query container image manifest list/indexes
Apache License 2.0
741 stars 92 forks source link

Error pushing manifest list/index #191

Open aamato80 opened 1 year ago

aamato80 commented 1 year ago

Hi all,

i am trying to use manifest-tool to create a multi architect docker image. As explained in your guide, i created the different images, in my case for arm64 and amd64 with kaniko, and i tried to exexcute the manifest tool. I tried both, yaml and spec mode, but without success. I receive any way an error like this one: Error pushing manifest list/index to registry: sha256:0c91a4e37f4765d431b50d62439ba660b8b57ae75412fd45c371d9174c38e3df: manifest list/index references to blobs and/or manifests are missing in your target registry...

This one is an example of the used yaml:

image: myrepo.com/myservice:latest
manifests:
  - image: myrepo.com/myservice-arm64:latest
    platform:
      architecture: arm64
      os: linux
  - image: myrepo.com/myservice-amd64:latest
    platform:
      architecture: amd64
      os: linux

Any suggestion? There is something wrong in my configuration? Many Thanks!

estesp commented 1 year ago

Can you run the push with --debug and provide the output? It sounds like it thinks a required components of the full tree of contained images (configs, manifests, and layers) is not existing in the target repo.

Would be also good to understand which registry you are pushing to (self-hosted? based on distribution/distribution? version?)

estesp commented 1 year ago

Hi @aamato80 have you been able to try the command with --debug so I can help figure out your issue?

b-morgenthaler commented 1 year ago

Hi @estesp I am taking over here since I think I am seeing the same issue as the OP and there was no progress regarding error/debug messages. Here's the error/debug message I am facing:

level=debug msg="do request" digest="sha256:1d40802bba338d4abdc4ef8395f8827a04fc5b9591e941c52eae27382f5f1192" mediatype=application/vnd.docker.distribution.manifest.list.v2+json request.header.content-type=application/vnd.docker.distribution.manifest.list.v2+json request.header.user-agent=containerd/1.6.18+unknown request.method=PUT size=699 url="https://self_hosted:5001/v2/image_name/manifests/v1.5.0" level=debug msg="fetch response received" digest="sha256:1d40802bba338d4abdc4ef8395f8827a04fc5b9591e941c52eae27382f5f1192" mediatype=application/vnd.docker.distribution.manifest.list.v2+json response.header.content-length=156 response.header.content-security-policy="sandbox allow-forms allow-modals allow-popups allow-presentation allow-scripts allow-top-navigation" response.header.content-type=application/json response.header.date="Thu, 20 Jul 2023 13:44:25 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.server="Nexus/3.41.0-01 (OSS)" response.header.strict-transport-security="max-age=7776000" response.header.x-content-type-options=nosniff response.header.x-xss-protection="1; mode=block" response.status="400 Bad Request" size=699 url="https://self_hosted:5001/v2/image_name/manifests/v1.5.0" level=debug msg="unexpected response" body="{\"errors\":[{\"code\":\"BLOB_UNKNOWN\",\"message\":\"blob unknown to registry\",\"detail\":\"sha256:f5ef2458f9f1d711e98db59f2239e1171bc9ecf2c442a068e11fc62e105d2b0c\"}]}" digest="sha256:1d40802bba338d4abdc4ef8395f8827a04fc5b9591e941c52eae27382f5f1192" mediatype=application/vnd.docker.distribution.manifest.list.v2+json resp="&{400 Bad Request 400 HTTP/1.1 1 1 map[Content-Length:[156] Content-Security-Policy:[sandbox allow-forms allow-modals allow-popups allow-presentation allow-scripts allow-top-navigation] Content-Type:[application/json] Date:[Thu, 20 Jul 2023 13:44:25 GMT] Docker-Distribution-Api-Version:[registry/2.0] Server:[Nexus/3.41.0-01 (OSS)] Strict-Transport-Security:[max-age=7776000] X-Content-Type-Options:[nosniff] X-Xss-Protection:[1; mode=block]] 0xc0003a8700 156 [] false false map[] 0xc0000aed00 0xc00039c210}" size=699 level=fatal msg="Error pushing manifest list/index to registry: sha256:1d40802bba338d4abdc4ef8395f8827a04fc5b9591e941c52eae27382f5f1192: manifest list/index references to blobs and/or manifests are missing in your target registry: failed commit on ref \"index-self_hosted:5001/image_name:v1.5.0@sha256:1d40802bba338d4abdc4ef8395f8827a04fc5b9591e941c52eae27382f5f1192\": unexpected status: 400 Bad Request"

It's worth noting that

A subsequent and passing call to your tool (without building/pushing images) produces this debug log:

level=debug msg="do request" digest="sha256:1d40802bba338d4abdc4ef8395f8827a04fc5b9591e941c52eae27382f5f1192" mediatype=application/vnd.docker.distribution.manifest.list.v2+json request.header.content-type=application/vnd.docker.distribution.manifest.list.v2+json request.header.user-agent=containerd/1.6.18+unknown request.method=PUT size=699 url="https://self_hosted:5001/v2/image_name/manifests/v1.5.0" level=debug msg="fetch response received" digest="sha256:1d40802bba338d4abdc4ef8395f8827a04fc5b9591e941c52eae27382f5f1192" mediatype=application/vnd.docker.distribution.manifest.list.v2+json response.header.content-length=699 response.header.content-security-policy="sandbox allow-forms allow-modals allow-popups allow-presentation allow-scripts allow-top-navigation" response.header.content-type=application/vnd.docker.distribution.manifest.list.v2+json response.header.date="Thu, 20 Jul 2023 13:49:34 GMT" response.header.docker-content-digest="sha256:1d40802bba338d4abdc4ef8395f8827a04fc5b9591e941c52eae27382f5f1192" response.header.docker-distribution-api-version=registry/2.0 response.header.last-modified="Thu, 20 Jul 2023 13:49:34 GMT" response.header.server="Nexus/3.41.0-01 (OSS)" response.header.strict-transport-security="max-age=7776000" response.header.x-content-type-options=nosniff response.header.x-xss-protection="1; mode=block" response.status="201 Created" size=699 url="https://self_hosted:5001/v2/image_name/manifests/v1.5.0" Digest: sha256:1d40802bba338d4abdc4ef8395f8827a04fc5b9591e941c52eae27382f5f1192 699

b-morgenthaler commented 1 year ago

Interesting data point: I did an inspect prior to pushing multi-arch index with your tool:

`$ ./manifest-tool --insecure inspect ${TARGET_DOCKER_REGISTRY}/${TARGET_DOCKER_GROUP}/arm64v8/${IMAGE_NAME}:${CI_COMMIT_TAG} Name: self_hosted:5001/image_name:v1.5.0 (Type: application/vnd.docker.distribution.manifest.v2+json) Digest: sha256:4b06cb1d532a350ee47dca2766a5940af4111fcbef40ab412ad6ecf399ad1af6 Size: 1364 OS: linux Arch: arm64

Layers: 5

  layer 01: digest = sha256:5af00eab97847634d0b3b8a5933f52ca8378f5f30a2949279d682de1e210d78b
  layer 02: digest = sha256:7e982ec86ba103af9415fb80b62fb4d3b7256fe818db532dd6cc41bb337d182f
  layer 03: digest = sha256:0c7b7546e0fe2ad9e0aa6b3d3b05ed37bcb158[44](https://self_hosted_gitlab_server/-/jobs/8052#L44)1cb0e95713dd3b76c8f35800
  layer 04: digest = sha256:8e49e2b7b3144d6e26befd9944c0c2fca9d0cb0ce47959961e68828152f07eec
  layer 05: digest = sha256:4fd84ae77ddde0a7bd16ee010d1[45](https://self_hosted_gitlab_server/-/jobs/8052#L45)da9f1ca4b3192eee7f973e5a083610fee76

$ ./manifest-tool --insecure inspect ${TARGET_DOCKER_REGISTRY}/${TARGET_DOCKER_GROUP}/amd64/${IMAGE_NAME}:${CI_COMMIT_TAG} Name: self_hosted:5001/image_name:v1.5.0 (Type: application/vnd.oci.image.manifest.v1+json) Digest: sha256:f8ef6f81748de0da98d902eb7da4aafc36590ed8bcd8cb15806c79453b0dd2bc Size: 893 OS: linux Arch: amd64

Layers: 4

  layer 01: digest = sha256:01085d60b3a624c06a7132ff07[49](https://self_hosted_gitlab_server/-/jobs/8052#L49)efc6e6565d9f2531d7685ff559fb5d0f669f
  layer 02: digest = sha256:f597caf2f79756536e25d4ff08317f77b988e141a34b41b[52](https://self_hosted_gitlab_server/-/jobs/8052#L52)5fde27cf84e9f76
  layer 03: digest = sha2[56](https://self_hosted_gitlab_server/-/jobs/8052#L56):1cca692a2d6413[57](https://self_hosted_gitlab_server/-/jobs/8052#L57)0af817aa136807de39b1b34bd5621c677[58](https://self_hosted_gitlab_server/-/jobs/8052#L58)eb6c7188ffef6
  layer 04: digest = sha256:b9829aedd8e18c8f886495b60d79db10733[59](https://self_hosted_gitlab_server/-/jobs/8052#L59)28f8952e412b20c184e4219bcd3`

But still pushing says, something is missing:

time="2023-07-21T11:43:57Z" level=debug msg="do request" digest="sha256:ab9b3b1d72688fbbe37cf9bd5233ea351049f860b9012b2e9023f782899835f1" mediatype=application/vnd.docker.distribution.manifest.list.v2+json request.header.content-type=application/vnd.docker.distribution.manifest.list.v2+json request.header.user-agent=containerd/1.6.18+unknown request.method=PUT size=699 url="https://self_hosted:5001/v2/image_name/manifests/v1.5.0" time="2023-07-21T11:43:57Z" level=debug msg="fetch response received" digest="sha256:ab9b3b1d72688fbbe37cf9bd5233ea351049f860b9012b2e9023f782899835f1" mediatype=application/vnd.docker.distribution.manifest.list.v2+json response.header.content-length=156 response.header.content-security-policy="sandbox allow-forms allow-modals allow-popups allow-presentation allow-scripts allow-top-navigation" response.header.content-type=application/json response.header.date="Fri, 21 Jul 2023 11:43:57 GMT" response.header.docker-distribution-api-version=registry/2.0 response.header.server="Nexus/3.41.0-01 (OSS)" response.header.strict-transport-security="max-age=7776000" response.header.x-content-type-options=nosniff response.header.x-xss-protection="1; mode=block" response.status="400 Bad Request" size=699 url="https://self_hosted:5001/v2/image_name/manifests/v1.5.0" time="2023-07-21T11:43:57Z" level=debug msg="unexpected response" body="{\"errors\":[{\"code\":\"BLOB_UNKNOWN\",\"message\":\"blob unknown to registry\",\"detail\":\"sha256:4b06cb1d532a350ee47dca2766a5940af4111fcbef40ab412ad6ecf399ad1af6\"}]}" digest="sha256:ab9b3b1d72688fbbe37cf9bd5233ea351049f860b9012b2e9023f782899835f1" mediatype=application/vnd.docker.distribution.manifest.list.v2+json resp="&{400 Bad Request 400 HTTP/1.1 1 1 map[Content-Length:[156] Content-Security-Policy:[sandbox allow-forms allow-modals allow-popups allow-presentation allow-scripts allow-top-navigation] Content-Type:[application/json] Date:[Fri, 21 Jul 2023 11:43:57 GMT] Docker-Distribution-Api-Version:[registry/2.0] Server:[Nexus/3.41.0-01 (OSS)] Strict-Transport-Security:[max-age=7776000] X-Content-Type-Options:[nosniff] X-Xss-Protection:[1; mode=block]] 0xc0002bd400 156 [] false false map[] 0xc0001a9100 0xc0002c1080}" size=699 time="2023-07-21T11:43:57Z" level=fatal msg="Error pushing manifest list/index to registry: sha256:ab9b3b1d72688fbbe37cf9bd5233ea351049f860b9012b2e9023f782899835f1: manifest list/index references to blobs and/or manifests are missing in your target registry: failed commit on ref \"index-self_hosted:5001/image_name:v1.5.0@sha256:ab9b3b1d72688fbbe37cf9bd5233ea351049f860b9012b2e9023f782899835f1\": unexpected status: 400 Bad Request"

estesp commented 1 year ago

@b-morgenthaler interesting; so the specific piece of content in the first error you included is specifically:

{
   "errors":
       [
           { "code": "BLOB_UNKNOWN",
             "message": "blob unknown to registry",
             "detail": "sha256:f5ef2458f9f1d711e98db59f2239e1171bc9ecf2c442a068e11fc62e105d2b0c"
           }
       ]
}

The flow of creating the manifest list/index is to first make sure that all referred content is in the target imageref repo using either cross-repo blob mount (for blobs) or pushing the actual ref into the target repo (without a tag). After those steps are complete, the manifest list/index referring to all that content is pushed. It seems there is a possible timing issue that the content is not fully committed in your chosen registry implementation such that the registry throws a "missing content" error when it is pushed immediately after all the dependent content within the "tree" of member images. I can only assume it works when you run it again because the content seems to now be stably in the registry's data store/index; however Nexus has implemented that.

Your second example shows that the missing content is the first manifest object (the linux/arm64 manifest digest), which would possibly be the last thing pushed in the ordering of steps that manifest-tool takes. Curious if you have any way to confirm that with other examples (e.g. that it's always a manifest and not a blob ref) which might confirm the timing issue.

I would prefer not to generate any artificial delays in manifest-tool and I'm not sure exactly what the OCI distribution spec states about the consistency of the registry's content following the return of a POST/push operation. But, it might be worth raising with Nexus as this is the first I've heard of any issues with the flow of operations in manifest-tool with a registry implementation.

b-morgenthaler commented 1 year ago

@estesp

It seems there is a possible timing issue that the content is not fully committed in your chosen registry implementation such that the registry throws a "missing content" error when it is pushed immediately after all the dependent content within the "tree" of member images.

My first thought was also a timing issue but ruled it out at the end after verifying the following which had no positive impact regarding the error:

On a side note: I activated --ignore-missing for pushing the multi-arch manifest. Shouldn't this switch prevent the error?

Interesting as well: The time between pushing the images and pushing the multi-arch manifest seems not to be important at all. No matter how fast or slow these two things happen after each other, a subsequent call to the manifest-tool push command always succeeds without any additional changes.

Curious if you have any way to confirm that with other examples (e.g. that it's always a manifest and not a blob ref) which might confirm the timing issue

It is always the same type of error I am seeing. Regarding the order of the pushed image and the error: the images are built/pushed in parallel on dedicated Gitlab runners, so I would have to go thru the logs to see which one finished first or later.

estesp commented 1 year ago

verifying that the pushed images are "existing" by calling manifest-tool inspect prior to manifest-tool push for all images. manifest-tool inspect returned no errors (see my log above in the first post) but manifest-tool push still did

I think I didn't do a great job separating the concepts of pushed content (as standalone images) and pushed content that gets created during the manifest list creation steps. You are correct that there is no issue with the created standalone image content and any timing issues there. The --ignore-missing is about the source content, not the target content being missing.

To be clearer, when you assemble a target manifest list from multiple source images, the target (final) repository must contain references to any source content that is outside that specific repository reference. Those references are pushed during the operations that manifest-tool performs before that final PUT operation that you included the debug output for a few comments ago. If you look at the debug logs that come before it, you will see several additional HTTP transactions with the registry. Those transactions are pushing additional content references based on your source images into the target repo so that the registry will "find" all the right pieces of the DAG (content tree) when it creates that final manifest list entry.

In your case the target repo is something like myrepo/image_name:some_version, and the source images are coming from myrepo/arm64v8/image_name:some_version and myrepo/amd64/image_name:some_version. If we want to test this theory about the commit state of the pushes of content references into the target, one option is to temporarily try using the same repo for source and target by using tags instead of distinct image repo names. That way these extra content references won't need to be pushed into the target repo, and if the commit state of these extra reference pushes is the problem with Nexus, then you won't see it anymore when using the tag method on the same repo—because they won't be performed at all.

For example, you can try creating source images named myrepo/image_name:some_version_arm64v8 and myrepo/image_name:some_version_amd64 and keep the target as myrepo/image_name:some_version. This keeps sources and target all in the same repo (myrepo/image_name*) and won't be doing any additional content reference pushes before doing the PUT of the final manifest list.

b-morgenthaler commented 1 year ago

@estesp

For example, you can try creating source images named myrepo/image_name:some_version_arm64v8 and myrepo/image_name:some_version_amd64* and keep the target as myrepo/image_name:some_version. This keeps sources and target all in the same repo (myrepo/image_name) and won't be doing any additional content reference pushes before doing the PUT of the final manifest list.

This seems to work. I didn't deploy the multi-arch manifest with manifest-tool often enough to tell if it's working for good. But so far, I did not see the error (not even once when I decreased the artificial delay to a minimum of 5 seconds). Thanks for this suggestion, I may use this as a work-around for now.

EDIT: after a few more build/push and deployments with manifest-tool (even removing the artificial delay completely), I am fairly sure that having everything within the same repo is properly working for a stable build pipeline.

How to move on from this point? Is this a registry issue or a combination between manifest-tool and the registry we use (Nexus)?