containerd / nerdctl

contaiNERD CTL - Docker-compatible CLI for containerd, with support for Compose, Rootless, eStargz, OCIcrypt, IPFS, ...
Apache License 2.0
8.24k stars 612 forks source link

Cannot commit and push container image from Kubernetes pods #827

Closed nettoclaudio closed 2 months ago

nettoclaudio commented 2 years ago

Hello folks! :wave:

I'm experiencing a strange behavior while committing a container (from Kubernetes pod) and pushing it to some container registry after. I always get the same error from the push command: content digest sha256:898c46f3b1a1f39827ed135f020c32e2038c87ae0690a8fe73d94e5df9e6a2d6: not found.

I'm filing this issue here but there's a chance the problem is on containerd side because I've already seen it working in an earlier version. I'm trying to find out which version it stopped to work.

I'm still investigating but no idea so far... suggestions are welcome.

Steps to reproduce:

  1. Create an arbitrary Kubernetes pod (for simplicity w/ just one container).

    $ kubectl run --image tsuru/go:latest my-app -- sleep Inf
  2. Execute arbitrary commands to generate changes in the container image (to be committed):

    $ kubectl exec my-app -- mkdir -p /tmp/foo/bar
  3. Commit the container to a local image:

    $ CONTAINER_ID=$(kubectl get pods my-app -o 'jsonpath={ .status.containerStatuses[0].containerID }' | sed -E 's|(.+)://||g')
    $ nerdctl --namespace k8s.io commit ${CONTAINER_ID} registry.example.com/my-app:v1
  4. Push to local registry (no credentials needed):

    $ nerdctl --namespace k8s.io push registry.example.com/my-app:v1
    FATA[0000] failed to create a tmp single-platform image "registry.example.com/my-app:v1-tmp-reduced-platform": content digest sha256:898c46f3b1a1f39827ed135f020c32e2038c87ae0690a8fe73d94e5df9e6a2d6: not found

Environment information:

junnplus commented 2 years ago

I can't reproduce this problem on containerd with latest and 1.5.10 version.

jun@lima-k8s-test:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.6", GitCommit:"8a62859e515889f07e3e3be6a1080413f17cf2c3", GitTreeState:"clean", BuildDate:"2021-04-15T03:28:42Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.6", GitCommit:"8a62859e515889f07e3e3be6a1080413f17cf2c3", GitTreeState:"clean", BuildDate:"2021-04-15T03:19:55Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}
jun@lima-k8s-test:~$ containerd -v
containerd containerd.io 1.5.10 2a1d4dbdb2a1030dc5b01e96fb110a9d9f150ecc

In addition, I can't start the corresponding containerd version through minikube.

PS: containerd v1.4.x reached EOL.

junnplus commented 2 years ago

I closed it because containerd v1.4.x reached EOL.

Feel free to open it up if you have any other questions.

Nirmit1995 commented 1 year ago

I am also getting the same error for large images (mine was 7GB, small images were easily pushed)

FATA[0000] failed to create a tmp single-platform image "registry.example.com/my-app:0086bf16-21cb-477b-952c-13780780f597-tmp-reduced-platform": content digest sha256:5bed26d33875e6da1d9ff9a1054c5fef3bbeb22ee979e14b72acf72528de007b: not found

I have the following config- [ec2-user@ip]$ sudo /usr/local/bin/nerdctl version WARN[0000] unable to determine buildctl version: exec: "buildctl": executable file not found in $PATH Client: Version: v1.4.0 OS/Arch: linux/amd64 Git commit: 7e8114a82da342cdbec9a518c5c6a1cce58105e9 buildctl: Version:

Server: containerd: Version: 1.6.19 GitCommit: 1e1ea6e986c6c86565bc33d52e34b81b3e2bc71f runc: Version: 1.1.4 GitCommit: 5fd4c4d144137e991c4acebb2146ab1483a97925

Please help in resolving this.

HFfleming commented 1 year ago

could you please reopen the issue, @junnplus thanks. I meet the same problem : I commit the image from a running container, then i need push the new image to my repo. when i use nerdctl -n k8s.io push xxx ,an error occurs: FATA[0000] failed to create a tmp single-platform image "xxx.com/hjmtest/nginx:hjm-tmp-reduced-platform": content digest sha256:550fe1bea624a5c62551cf09f3aa10886eed133794844af1dfb775118309387e: not found

here is my env info: nerdctl -v nerdctl version 1.4.0 containerd -v containerd github.com/containerd/containerd v1.6.14-42-g21f32b2c3 21f32b2c394bc1ccfbf1744876a0fcdd4ef2390d

HFfleming commented 1 year ago

could you please open this issue, @junnplus thanks. I meet the same problem : I commit the image from a running container, then i need push the new image to my repo. when i use nerdctl -n k8s.io push xxx ,an error occurs: FATA[0000] failed to create a tmp single-platform image "xxx.com/hjmtest/nginx:hjm-tmp-reduced-platform": content digest sha256:550fe1bea624a5c62551cf09f3aa10886eed133794844af1dfb775118309387e: not found

here is my env info: nerdctl -v nerdctl version 1.4.0 containerd -v containerd github.com/containerd/containerd v1.6.14-42-g21f32b2c3 21f32b2c394bc1ccfbf1744876a0fcdd4ef2390d

In addition: if i pull images from others, not commit a new image, nerdctl push xxxcan work.

ggerogery commented 1 year ago

I got the same issue but with finch. Solved with --platform arg: finch push xxx.dkr.ecr.us-east-1.amazonaws.com/infra/calico/apiserver:v3.25.1 --platform linux/amd64

baozaolaoba-top commented 1 year ago

In addition:

nerdctl version 1.5.0 containerd -v containerd github.com/containerd/containerd v1.7.1 1677a17964311325ed1c31e2c0a3589ce6d5c30d

harbor server with port: hub.local:5443

Zheaoli commented 1 year ago
WARN[0000] unable to determine buildctl version: exec: "buildctl": executable file not found in $PATH 
Client:
 Version:       v1.5.0
 OS/Arch:       linux/amd64
 Git commit:    b33a58f288bc42351404a016e694190b897cd252
 buildctl:
  Version:

Server:
 containerd:
  Version:      1.7.1+azure-1
  GitCommit:    1677a17964311325ed1c31e2c0a3589ce6d5c30d
 runc:
  Version:      1.1.7
  GitCommit:    860f061b76bb4fc671f0f9e900f7d80ff93d4eb7

could not produce this bug. would you guys mind to give me reproduce step with more detail? like what's registry you use

JCereal commented 1 year ago

Recently, my GKE cluster's worker node upgraded to v1.27.3-gke.100 which also upgraded containerd from 1.6.18 to 1.7.0. After the upgrade, I encountered the exact same issue.

Client:
 Version:   v1.7.0
 OS/Arch:   linux/amd64
 Git commit:    e674fe7ba6e49f12e88cd9c6c442e7ea5232502c
 buildctl:
  Version:  v0.12.3
  GitCommit:    438f47256f0decd64cc96084e22d3357da494c27

Server:
 containerd:
  Version:  1.7.0
  GitCommit:    1fbd70374134b891f97ce19c70b6e50c7b9f4e0d
 runc:
  Version:  1.1.10
  GitCommit:    v1.1.10-0-g18a0cb0f

Reproduce

The commit command: nerdctl -n k8s.io commit bbe68406a258 some.registry.com/my-app:0.1

The push command like: nerdctl -n k8s.io push some.registry.com/my-app:0.1

Output: FATA[0000] failed to create a tmp single-platform image "some.registry.com/my-app:0.1-tmp-reduced-platform": content digest sha256:c95ff01263cbbb536e71f8ae823d3e63f15f7a0f1ba9ecb7b3126a63654d2b23: not found.

Work around

Add --all-platforms option.

If I run the push command like: nerdctl -n k8s.io push --all-platforms some.registry.com/my-app:0.1, the image can be pushed if the registry is valid (btw I pushed to JFrog).

The option is added to avoid converter.Convert. Related code: https://github.com/containerd/nerdctl/blob/ce2f63d275c80f1ba2a3e2d9bfdc15ded3ff73c7/pkg/cmd/image/push.go#L98-L111

Other info

The issue is not registry related. I tried different registries, the same error is thrown, even with random invalid registry.

The container I am committing from is built on an Ubuntu machine with docker build . command, without specifying the platform as linux/amd64.

When it used to work with containerd 1.6.18, I can see the log level=info msg="pushing as a reduced-platform image.... The convert should be good then.

beyou923 commented 12 months ago

I started a container using the nydus format and then packaged the container into an image. However, I encountered the same problem when I tried to push this image.

Steps to reproduce:

Does the container launched by on-demand loading of images support commit and push operations?

iholo commented 10 months ago

I have the same problem on Azure image image

env info: kubernetes: v1.27.7 Kernel: 5.15.0-1053-azure nerdctl version 1.7.3 containerd: 1.7.1+azure-1

michaelmalice commented 5 months ago

Hi I'm having the same issue. Is there a specific version of containerd that I can use that doesn't have this problem?

hj1801 commented 5 months ago

I am encountering the same issue. It seems to occur from kubernetes 1.27 version, and it remains unresolved even when using the latest version of nerdctl (1.7.6). As someone else pointed out, the --all-platforms or --platform flag does not work while pushing. Looking at the issue, it seems that it starts from failing to save a committed image.

This has been an ongoing issue since the past and I am wondering if there have been any follow-ups on this matter.

zgfh commented 4 months ago

this work for me : nerdctl tag with a new image neme ,then push again image

lingdie commented 4 months ago

Maybe config descriptor missed.

https://github.com/containerd/nerdctl/blob/main/pkg/imgutil/commit/commit.go#L245

    err = content.WriteBlob(ctx, cs, configDesc.Digest.String(), bytes.NewReader(newConfigJSON), configDesc)
    if err != nil {
        return ocispec.Descriptor{}, emptyDigest, err
    }
lingdie commented 4 months ago

By default, commit image will not be unpacked to snapshots.

ctr --namespace=k8s.io i check
REF                                                                     TYPE                                                 DIGEST                                                                  STATUS           SIZE                UNPACKED
docker.io/lingdie/busybox:commit                                        application/vnd.docker.distribution.manifest.v2+json sha256:a7dce96e72a59c80479569d48a1376a36a2c2b0e9d2ba0e3dca707e2021d5e72 complete (4/4)   28.3 MiB/28.3 MiB   false

Unpacking the image may solve the problem:

ctr --namespace=k8s.io snapshots unpack sha256:a7dce96e72a59c80479569d48a1376a36a2c2b0e9d2ba0e3dca707e2021d5e72
unpacking sha256:a7dce96e72a59c80479569d48a1376a36a2c2b0e9d2ba0e3dca707e2021d5e72 (application/vnd.docker.distribution.manifest.v2+json)...done

nerdctl --namespace k8s.io image convert lingdie/busybox:commit lingdie/busybox:tmp
sha256:a7dce96e72a59c80479569d48a1376a36a2c2b0e9d2ba0e3dca707e2021d5e72

nerdctl --namespace k8s.io image push lingdie/busybox:tmp
INFO[0000] pushing as a reduced-platform image (application/vnd.docker.distribution.manifest.v2+json, sha256:a7dce96e72a59c80479569d48a1376a36a2c2b0e9d2ba0e3dca707e2021d5e72)
manifest-sha256:a7dce96e72a59c80479569d48a1376a36a2c2b0e9d2ba0e3dca707e2021d5e72: done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:55fefd844bcb12134766ab042567efccb311ad9f051e68497c6a207dbdbe7afc:   done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 5.0 s                                                                    total:  2.5 Ki (509.0 B/s)
lingdie commented 4 months ago

this pr may fix this:

https://github.com/containerd/nerdctl/pull/3268

apostasie commented 3 months ago

@lingdie testing on main from today, the issue is still there (as described by OP).

Wondering what we are missing here...

lingdie commented 3 months ago

Here is my testing process. Could you please provide your testing?

image image image

apostasie commented 3 months ago

@lingdie

kind version 0.23.0

nerdctl latest main

ubuntu 24.04 (in lima on a mac M1 with Sonoma 14.5)

kind.yaml:

# https://pkg.go.dev/sigs.k8s.io/kind/pkg/apis/config/v1alpha4#Cluster
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    extraMounts:
      - hostPath: _output/nerdctl
        containerPath: /usr/local/bin/nerdctl
      - hostPath: /tmp/go
        containerPath: /usr/local/go
      - hostPath: .
        containerPath: /nerdctl-source

Create the cluster:

make binaries

KIND_EXPERIMENTAL_PROVIDER=nerdctl kind create cluster --config=./kind.yaml

nerdctl exec -ti kind-control-plane bash

Inside the control plane from above:

kubectl run --image debian my-app -- sleep Inf
kubectl exec my-app -- mkdir -p /tmp/foo/bar
CONTAINER_ID=$(kubectl get pods my-app -o "jsonpath={ .status.containerStatuses[0].containerID }" | sed -E "s|(.+)://||g")

nerdctl --namespace k8s.io commit ${CONTAINER_ID} registry.example.com/my-app:v1
nerdctl --namespace k8s.io push registry.example.com/my-app:v1

More info:

root@kind-control-plane:/# nerdctl info
Client:
 Namespace: default
 Debug Mode:    false

Server:
 Server Version: v1.7.15
 Storage Driver: overlayfs
 Logging Driver: json-file
  Cgroup Driver:  : systemd
  Cgroup Version: : 2
 Plugins:
  Log:     fluentd journald json-file syslog
  Storage: native overlayfs fuse-overlayfs
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version:   6.8.0-31-generic
 Operating System: Debian GNU/Linux 12 (bookworm)
 OSType:           linux
 Architecture:     aarch64
 CPUs:             4
 Total Memory:     3.814GiB
 Name:             kind-control-plane
 ID:               c74d79a4-7d13-4eca-ae50-6e2e95ae5043

WARNING: No swap limit support

@lingdie let me know if I am missing something here, or if you need anything else.

Also, when we get #3296 in we will be able to test for Kube on the CI (in the same conditions, using Kind).

lingdie commented 3 months ago

I encountered the same issue when using the same test plan as you.😥

image

And I found that the missing digest never existed in the image. This issue also occurred when I was implementing layer-suqash project. I fixed the issue by writing the config into the content store as well: https://github.com/labring/layer-squash/blob/main/pkg/runtime/runtime.go#L256

apostasie commented 3 months ago

@lingdie there is clearly something wrong going on with this.

This issue keeps cropping up on a variety of operations - save, commit, with or without kube - https://github.com/containerd/nerdctl/pull/3179

I believe these have all the same root issue, but I cannot pinpoint it...

ouyangningdong commented 3 months ago

Regarding the issue of saving snapshots into tar files, has it been resolved? Do I use the latest version 2.0.0-rc1, but it still prompts that some levels of sha256 cannot be found. Is it lost due to garbage collection?

apostasie commented 3 months ago

@ouyangningdong if your issue is the same that is being discussed here, it is not resolved - and the ticket is still open.

If your issue is different, I suggest you open a new ticket.

Also, it does not seem like a garbage collection problem.

apostasie commented 3 months ago

@lingdie I did add the scenario as a test (https://github.com/containerd/nerdctl/blob/main/cmd/nerdctl/container_commit_linux_test.go#L28) - right now, the test expectation is to fail (until we fix the problem).

Interestingly, the test failed once on an unrelated PR: https://github.com/containerd/nerdctl/actions/runs/10497512059/job/29080522608

... meaning the problem might be racy?

lingdie commented 3 months ago

The only thing I can think of is that containerd might have garbage collected some essential layers...

lingdie commented 2 months ago

I believe I have found the source of the problem. Below is my reproduction process:

  1. Start the pod

    kubectl run busybox --image busybox -- sleep 12000
  2. Check the original image, find it's in an incomplete state

    ctr --namespace=k8s.io i check | grep busybox
    docker.io/library/busybox:latest                                                                  application/vnd.oci.image.index.v1+json                   sha256:34b191d63fbc93e25e275bfccf1b5365664e5ac28f06d974e8d50090fbb49f41 incomplete (1/2) 390.0 B/1.8 MiB     true
    docker.io/library/busybox@sha256:34b191d63fbc93e25e275bfccf1b5365664e5ac28f06d974e8d50090fbb49f41 application/vnd.oci.image.index.v1+json                   sha256:34b191d63fbc93e25e275bfccf1b5365664e5ac28f06d974e8d50090fbb49f41 incomplete (1/2) 390.0 B/1.8 MiB     true
  3. Commit the container

    
    containerID=$(kubectl get pods -o go-template='{{range .items}}{{range .status.containerStatuses}}{{if eq .name "busybox"}}{{.containerID}}{{"\n"}}{{end}}{{end}}{{end}}' | grep -oP '(?<=containerd://)\w+')

nerdctl commit $containerID docker.io/lingdie/busybox:dev


4. Check the committed image, it's also in an incomplete state
```bash
ctr --namespace=k8s.io i check | grep busybox
docker.io/library/busybox:latest                                                                  application/vnd.oci.image.index.v1+json                   sha256:34b191d63fbc93e25e275bfccf1b5365664e5ac28f06d974e8d50090fbb49f41 incomplete (1/2) 390.0 B/1.8 MiB     true
docker.io/library/busybox@sha256:34b191d63fbc93e25e275bfccf1b5365664e5ac28f06d974e8d50090fbb49f41 application/vnd.oci.image.index.v1+json                   sha256:34b191d63fbc93e25e275bfccf1b5365664e5ac28f06d974e8d50090fbb49f41 incomplete (1/2) 390.0 B/1.8 MiB     true
docker.io/lingdie/busybox:dev                                                                     application/vnd.docker.distribution.manifest.v2+json      sha256:66b5ca932bd733887d3c4c4d808b6816f9bafed9ea5fb1d350d0a1e84f86781e incomplete (2/3) 791.0 B/1.8 MiB     true
  1. Try to push the image using nerdctl, which of course fails

    nerdctl push docker.io/lingdie/busybox:dev
    FATA[0000] failed to create a tmp single-platform image "docker.io/lingdie/busybox:dev-tmp-reduced-platform": content digest sha256:75e8ca8f509fb8f7ee74430edbdc5b78fd863a4f08e06c53828e9a996a79f642: not found
  2. Use ctr to pull the image again

    ctr --namespace k8s.io image pull docker.io/library/busybox:latest
    docker.io/library/busybox:latest:                                                 resolved       |++++++++++++++++++++++++++++++++++++++|
    index-sha256:34b191d63fbc93e25e275bfccf1b5365664e5ac28f06d974e8d50090fbb49f41:    exists         |++++++++++++++++++++++++++++++++++++++|
    manifest-sha256:e7e097403ca9266ab43ca83b3b55c272c248cf46489a40bcc4864bd7dd945f18: exists         |++++++++++++++++++++++++++++++++++++++|
    layer-sha256:75e8ca8f509fb8f7ee74430edbdc5b78fd863a4f08e06c53828e9a996a79f642:    done           |++++++++++++++++++++++++++++++++++++++|
    config-sha256:fd633c23ab56c682e69aacf7fea8d0eaa8d18bd72c4b1c6d82eb5eaace658f1a:   exists         |++++++++++++++++++++++++++++++++++++++|
    elapsed: 8.0 s                                                                    total:  1.0 Mi (128.0 KiB/s)
    unpacking linux/arm64/v8 sha256:34b191d63fbc93e25e275bfccf1b5365664e5ac28f06d974e8d50090fbb49f41...
    done: 9.078793ms    
  3. Check the image

    ctr --namespace=k8s.io i check | grep busybox
    docker.io/library/busybox:latest                                                                  application/vnd.oci.image.index.v1+json                   sha256:34b191d63fbc93e25e275bfccf1b5365664e5ac28f06d974e8d50090fbb49f41 complete (2/2)   1.8 MiB/1.8 MiB     true
    docker.io/library/busybox@sha256:34b191d63fbc93e25e275bfccf1b5365664e5ac28f06d974e8d50090fbb49f41 application/vnd.oci.image.index.v1+json                   sha256:34b191d63fbc93e25e275bfccf1b5365664e5ac28f06d974e8d50090fbb49f41 complete (2/2)   1.8 MiB/1.8 MiB     true
    docker.io/lingdie/busybox:dev                                                                     application/vnd.docker.distribution.manifest.v2+json      sha256:66b5ca932bd733887d3c4c4d808b6816f9bafed9ea5fb1d350d0a1e84f86781e complete (3/3)   1.8 MiB/1.8 MiB     true
  4. Push the image using nerdctl, success...

    nerdctl push docker.io/lingdie/busybox:dev
    INFO[0000] pushing as a reduced-platform image (application/vnd.docker.distribution.manifest.v2+json, sha256:66b5ca932bd733887d3c4c4d808b6816f9bafed9ea5fb1d350d0a1e84f86781e)
    manifest-sha256:66b5ca932bd733887d3c4c4d808b6816f9bafed9ea5fb1d350d0a1e84f86781e: done           |++++++++++++++++++++++++++++++++++++++|
    layer-sha256:eb86545fdbaa77abb944627619e66fb8c1853b427fc930493c561988674054be:    done           |++++++++++++++++++++++++++++++++++++++|
    config-sha256:f78592bf867943251d217918b2299bbd7191970ae83dc52a8afc5e7ad98bf71d:   done
lingdie commented 2 months ago

see: https://github.com/containerd/containerd/issues/8973