ECR [request]: expose image labels in DescribeImages response

alexrudd commented 5 years ago

Hi!

Tell us about your request I would like image labels to be exposed over the DescribeImages API. Docker labels are the only way to add immutable metadata to an image, having a way to view that data without pulling the image down would be really useful!

Which service(s) is this request for? ECR

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? We'd like to add metadata about a particular docker image to the image itself. Info such as: git hash, date built, repository, branch, etc; and then be able to lookup that data remotely using the ECR API.

This data is useful for working backwards from a container running in ECS, to the exact source that it was built from. Tags can serve a similar purpose but they're mutable and not connected directly to the image itself.

Thank you!

lloydpick commented 5 years ago

We are also starting to go down this path, and need access to the docker labels without having to pull the image from ECR. OCI is also now publishing a standard for the labels here - https://github.com/opencontainers/image-spec/blob/master/annotations.md Obviously there can be more than their own, it just would be nice to see these in the interface and the API.

tomwillfixit commented 4 years ago

We are using OCI labelling throughout the CI/CD pipeline and being able to query these labels using awscli or through the ECR API would be incredibly useful. Is there any type of workaround to access the labels without pulling the entire image? Thanks :)

samuelkarp commented 4 years ago

You can access the manifest of an image stored in ECR without pulling the entire image today. The manifest can be retrieved through the BatchGetImage API, and you can then parse the manifest to find annotations that are stored there. While this isn't the friendliest way to get at the annotations, it might be a reasonable work-around for you if you already know which image(s) you want to examine.

tomwillfixit commented 4 years ago

Thanks Samuel. That is useful to know. I was able to get the manifest but the labels which I see when running docker inspect against an image are not in the manifest. For example I'm looking to get at these labels :

"Labels": { "com..image.created": "Wed 30 Oct 2019 17:11:16 GMT", "com.image.maintainer": "tom", "com.image.revision": "123456", "com.image.source": "blah", "com.image.title": "blah v2.0.1" }

TheTweak commented 4 years ago

@tomwillfixit try to pass "application/vnd.docker.distribution.manifest.v1+json" in acceptedMediaTypes parameter in BatchGetImage. Labels should be in "history" array in the response.

tomwillfixit commented 4 years ago

Thank you. This works.

aws ecr batch-get-image --repository-name --image-id imageTag= --accepted-media-types "application/vnd.docker.distribution.manifest.v1+json" --output text

tomwillfixit commented 4 years ago

One liner to return labels :

aws ecr batch-get-image --repository-name --image-id imageTag= --accepted-media-types "application/vnd.docker.distribution.manifest.v1+json" --output json |jq -r '.images[].imageManifest' |jq -r '.history[0].v1Compatibility' |jq -r '.config.Labels'

Dejulia489 commented 4 years ago

Thanks this works great! Added the to the command and the PowerShell version

aws ecr batch-get-image --repository-name <repo_name> --image-id imageTag=<tag_name> --accepted-media-types "application/vnd.docker.distribution.manifest.v1+json" --output json |jq -r '.images[].imageManifest' |jq -r '.history[0].v1Compatibility' |jq -r '.config.Labels'

(((aws ecr batch-get-image --repository-name <repo_name> --image-id imageTag=<tag_name> --accepted-media-types "application/vnd.docker.distribution.manifest.v1+json" --output json | convertFrom-Json -ov a).images.imageManifest | ConvertFrom-Json).history.v1Compatibility | convertFrom-Json).Config.Labels

CONJAUMCGCG commented 4 years ago

It does work with --image-id imageTag=. But does not work if I pass --image-id imageDigest= instead of the imageTag. aws ecr batch-get-image --repository-name --image-id imageDigest= --accepted-media-types "application/vnd.docker.distribution.manifest.v1+json" --output json |jq -r '.images[].imageManifest' |jq -r '.history[0].v1Compatibility' |jq -r '.config.Labels'

priyendra commented 3 years ago

Any idea why this command works with tags but does not return the same information when we pass in the image digest? Is there an active open issue where this is being tracked?

marcus-crane commented 2 years ago

@priyendra A bit late but here's an answer for anyone else coming across this thread: https://docs.docker.com/registry/compatibility/#registry-v23

When the manifest is pulled by tag with Docker Engine 1.9 and older, the manifest is converted on-the-fly to Schema 1 and sent in the response. The Docker Engine 1.9 is compatible with this older format.

When the manifest is pulled by digest with Docker Engine 1.9 and older, the same rewriting process does not happen in the registry. If it did, the digest would no longer match the hash of the manifest and would violate the constraints of CAS.

In short, because the digest is a hash of the manifest body and when you request a v1 manifest, it has to rewrite the body into the v1 schema. Given the body is different as a result of translation, it no longer matches the digest hash.

Since this thread, it doesn't seem that AWS serves manifest.v1 as a media type anymore but here's a command for getting labels from a multi-arch image. For those not retrieving from multi-arch images, you can just remove the second ecr batch-get-image command and it's probably about the same.

export REPOSITORY_NAME=blah
export IMAGE_TAG=abc123
aws ecr batch-get-image --repository-name=$REPOSITORY_NAME --image-id imageTag=$IMAGE_TAG --region=us-east-1 --output json |
  jq -r '.images[].imageManifest' |
  jq '.manifests[0].digest' |
  xargs -I{} aws ecr batch-get-image --repository-name=$REPOSITORY_NAME --image-id imageDigest={} |
  jq -r '.images[].imageManifest' |
  jq '.config.digest' |
  xargs -I{} aws ecr get-download-url-for-layer --repository-name=$REPOSITORY_NAME --layer-digest={} |
  jq '.downloadUrl' |
  xargs curl -s |
  jq '.config.Labels'

Output

```json { "org.opencontainers.image.created": "2021-12-15T02:57:56+0000", "org.opencontainers.image.revision": "abc123", "org.whatever.else": "blah" } ```

Note that this just indexes into whatever the first image manifest is since there are multiple within a "fat manifest" but all of the images should be labelled exactly the same so it doesn't really matter.

priyendra commented 2 years ago

That worked very nicely for us.. Thanks @marcus-crane !

erikogan commented 2 years ago

@marcus-crane Thank you for this! You’ve gotten me most of the way there, I think.

I had to make some minor modifications, but I’m correctly (maybe) extracting a downloadUrl for an image layer.

Unfortunately, the last step is failing because the response to a GET request on that URL is the gzip’ed tarball for the layer, not JSON that I can extract labels from.

Here’s my modified command line (unning aws v2.4.15):

aws ecr batch-get-image --repository-name=$REPOSITORY_NAME --image-id imageTag=$IMAGE_TAG --region=us-east-1 --output json |
  jq -r '.images[].imageManifest' |
  jq '.layers[0].digest' |
  xargs -I{} aws ecr get-download-url-for-layer --repository-name=$REPOSITORY_NAME --layer-digest={} |
  jq '.downloadUrl'
  xargs curl -s |
  jq '.config.Labels'

(Originally I was building this into a python script where I already have the imageDigest from a previous API call, but when I couldn’t get it working I backed off to try to reproduce it from your command line)

Any ideas? Did they change the API, or do you see what I’m doing wrong?

marcus-crane commented 2 years ago

@erikogan Ah yep, it depends on the type of Docker manifest you're fetching. When I was doing the experiment which lead to my comment above, I was dealing with a "fat manifest" or a manifest list.

In short, we have two image manifests (one x86 and one arm) which are themselves part of a manifest list so the above command doesn't work for single architecture images as it tries to dive one step too deep.

In your case, I take it that you aren't dealing with a multi-architecture image so you basically just need to drop lines 4 and 5 in the original command. The nice thing about that is you can operate on images with the same logic, you just execute lines 4 and 5 if the manifest type is application/vnd.docker.distribution.manifest.list.v2+json. In fact, we do just that in our build pipelines in order to inspect labels regardless of the image type (single arch or multi)

Anyway, here's the command modified for single-arch images:

export REPOSITORY_NAME=blah
export IMAGE_TAG=abc123
aws ecr batch-get-image --repository-name=$REPOSITORY_NAME --image-id imageTag=$IMAGE_TAG --region=us-east-1 --output json |
  jq -r '.images[].imageManifest' |
  jq '.config.digest' |
  xargs -I{} aws ecr get-download-url-for-layer --repository-name=$REPOSITORY_NAME --layer-digest={} |
  jq '.downloadUrl' |
  xargs curl -s |
  jq '.config.Labels'

It's almost the exact same as what you had but where you were running into trouble is that you retrieved the digest for the first layer rather than the config digest ie; jq '.layers[0].digest' vs jq '.config.digest'. The config digest is the bit that contains all the labels 🙂

marcus-crane commented 2 years ago

@erikogan Here's a quick bash script that combines the two scripts and dynamically does any extra requests required for manifests lists compared to single-arch images:

#!/bin/bash

# Usage:         ./inspect-labels repository tag
# Example usage: ./inspect-labels anyservice abc1234

REPOSITORY_NAME=$1
IMAGE_TAG=$2

if [[ $REPOSITORY_NAME == "" || $IMAGE_TAG == "" ]]; then
  echo "Please enter both a repository name and an image tag"
  echo "Usage: ./inspect-labels myservice abc1234"
  exit 1
fi

MANIFEST=$(aws ecr batch-get-image --repository-name=$REPOSITORY_NAME --image-id imageTag=$IMAGE_TAG --region=us-west-2 --output json | jq -r '.images[].imageManifest')

if [[ $MANIFEST == "" ]]; then
  echo "No results found for that Docker manifest"
  exit 1
fi

MEDIA_TYPE=$(jq '.mediaType' <<< "${MANIFEST}")

if [[ $MEDIA_TYPE == '"application/vnd.docker.distribution.manifest.list.v2+json"' ]]; then
  INNER_DIGEST=$(jq '.manifests[0].digest' <<< "${MANIFEST}")
  MANIFEST=$(aws ecr batch-get-image --repository-name=$REPOSITORY_NAME --image-id imageDigest=$INNER_DIGEST | jq -r '.images[].imageManifest')
fi

CONFIG_DIGEST=$(jq '.config.digest' <<< "${MANIFEST}")
DOWNLOAD_URL=$(xargs -I{} aws ecr get-download-url-for-layer --repository-name=$REPOSITORY_NAME --layer-digest={} <<< "${CONFIG_DIGEST}" | jq '.downloadUrl')
CONTENT=$(xargs curl -s <<< "${DOWNLOAD_URL}")

echo $CONTENT | jq '.config.Labels'

$CONTENT represents one of the images (all the same, just different architectures) and you can just remove the config.Labels selector to see the entire image contents which can be handy as well.

erikogan commented 2 years ago

WOW! Thank you!

I did think it was strange that we were downloading whole layers to get the labels. In my script I sorted them by size to hopefully download a smaller one. 🤣 In hindsight, it makes sense that the configuration is stored in a special layer.

Now I just need to port this back to my boto3 python script, but that’s trivial now that you’ve done the hard work!

(Also: I should probably look into fat manifest images, I suspect it might help the folks on my team with arm-based machines having performance issues. So thank you for that, TOO!)

marcus-crane commented 2 years ago

@erikogan No prob, the multi-arch stuff is kinda weird but you basically want to look at buildx: https://docs.docker.com/buildx/working-with-buildx You don't really handle the manifests directly, buildx just kinda does it all for you

aws / containers-roadmap

ECR [request]: expose image labels in DescribeImages response #178