containerd / nerdctl

contaiNERD CTL - Docker-compatible CLI for containerd, with support for Compose, Rootless, eStargz, OCIcrypt, IPFS, ...
Apache License 2.0
8.08k stars 598 forks source link

content digest not found: platform-reduced push, follow-up to #3425 #3509

Open apostasie opened 2 weeks ago

apostasie commented 2 weeks ago

Description

Similarly to #3489, it seems like there may be another case that was not listed in #3425 - or something having side-effects there, when we rely on a temp-reduced-plaform to push.

It is not entirely clear to me what is going on in there - especially what is happening with the whole reduced-platform contorsions (and why we even need this...) - so, I do not have a quick fix that I can send immediately.

While this was somewhat expected... https://github.com/containerd/nerdctl/pull/3435#discussion_r1766200030 - we should have a hard look at this code.

Steps to reproduce the issue

Hammer the tests.

It took about 20 iterations of

go test ./cmd/nerdctl/images -count 1

to get there.

Note that this was produced with the rewritten tests in #3492.

Describe the results you received and expected

--- FAIL: TestPush (14.40s)
    image_push_linux_test.go:253: ======================== Pre-test cleanup ========================
    image_push_linux_test.go:253: ======================== Test setup ========================
    image_push_linux_test.go:253: ======================== Test Run ========================
    image_push_linux_test.go:253: ======================== Processing subtests ========================
    --- FAIL: TestPush/with_hosts_dir,_with_login (0.88s)
        image_push_linux_test.go:253: ======================== Pre-test cleanup ========================
        command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test rmi
        image_push_linux_test.go:253: ======================== Test setup ========================
        command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test pull ghcr.io/stargz-containers/alpine:3.13-org
        command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test tag ghcr.io/stargz-containers/alpine:3.13-org 192.168.5.15:5002/testpush-with-hosts-dir-with-login-a6c55caa:3.13-org
        command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test --hosts-dir /tmp/TestPush2005531285/007/certs.d2208102383 login -u admin -p badmin 192.168.5.15:5002
        image_push_linux_test.go:253: ======================== Test Run ========================
        command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test push --hosts-dir /tmp/TestPush2005531285/007/certs.d2208102383 192.168.5.15:5002/testpush-with-hosts-dir-with-login-a6c55caa:3.13-org
        command.go:112: assertion failed: expect.ExitCode is not result.ExitCode: Expected exit code: 0

            Command:  /usr/local/bin/nerdctl --namespace=nerdctl-test push --hosts-dir /tmp/TestPush2005531285/007/certs.d2208102383 192.168.5.15:5002/testpush-with-hosts-dir-with-login-a6c55caa:3.13-org
            ExitCode: 1
            Error:    exit status 1
            Stdout:   index-sha256:d13219399e61ee5d3c2b411e758d38cf1e1fef0185c74f2ce682205dededc8e0: waiting        |--------------------------------------|
            elapsed: 0.1 s                                                                 total:   0.0 B (0.0 B/s)

            Stderr:   time="2024-10-07T00:59:49-07:00" level=info msg="pushing as a reduced-platform image (application/vnd.docker.distribution.manifest.list.v2+json, sha256:d13219399e61ee5d3c2b411e758d38cf1e1fef0185c74f2ce682205dededc8e0)"
            time="2024-10-07T00:59:49-07:00" level=fatal msg="content digest sha256:d13219399e61ee5d3c2b411e758d38cf1e1fef0185c74f2ce682205dededc8e0: not found"

            Env:
            SHELL=/bin/bash
            LOGNAME=dmp
            XDG_SESSION_TYPE=tty
            HOME=/home/dmp.linux
            LANG=C.UTF-8
            SSH_CONNECTION=192.168.5.2 51297 192.168.5.15 22
            LESSCLOSE=/usr/bin/lesspipe %s %s
            XDG_SESSION_CLASS=user
            TERM=xterm-256color
            LESSOPEN=| /usr/bin/lesspipe %s
            USER=dmp
            SHLVL=2
            XDG_SESSION_ID=2
            XDG_RUNTIME_DIR=/run/user/501
            SSH_CLIENT=192.168.5.2 51297 22
            XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop
            DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/501/bus
            SSH_TTY=/dev/pts/1
            OLDPWD=/Users/dmp
            _=/usr/local/go/bin/go
            PATH=/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/sbin:/sbin:/usr/sbin:/sbin:/usr/sbin:/sbin:/usr/local/go/bin
            PWD=/Users/dmp/Projects/go/nerd/nerdctl/cmd/nerdctl/image
            DOCKER_CONFIG=/tmp/TestPushwith_hosts_dir,_with_login3884284200/001
            NERDCTL_TOML=/tmp/TestPushwith_hosts_dir,_with_login3884284200/001/nerdctl.toml
        case.go:164: ======================== Post-test cleanup ========================
        command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test rmi 192.168.5.15:5002/testpush-with-hosts-dir-with-login-a6c55caa:3.13-org
    case.go:164: ======================== Post-test cleanup ========================
FAIL
FAIL    github.com/containerd/nerdctl/v2/cmd/nerdctl/image  34.020s
FAIL
FAIL

What version of nerdctl are you using?

main

Are you using a variant of nerdctl? (e.g., Rancher Desktop)

None

Host information

No response

apostasie commented 2 weeks ago

With some additional EnsureContent and instrumentation.

This is totally 🤯.

Note that we DO download d13219399e61ee5d3c2b411e758d38cf1e1fef0185c74f2ce682205dededc8e0, which was missing.

Then the reduced-platform push tries to push exactly that layer ^ and fails complaining that it is not here...

       image_push_linux_test.go:272: ======================== Test Run ========================
        command.go:112: /usr/local/bin/nerdctl --namespace=nerdctl-test push 127.0.0.1:5000/testpush-plain-http-with-localhost-0bf13f12:3.13-org
        command.go:112: assertion failed: expect.ExitCode is not result.ExitCode: Expected exit code: 0

            Command:  /usr/local/bin/nerdctl --namespace=nerdctl-test push 127.0.0.1:5000/testpush-plain-http-with-localhost-0bf13f12:3.13-org
            ExitCode: 1
            Error:    exit status 1
            Stdout:   index-sha256:d13219399e61ee5d3c2b411e758d38cf1e1fef0185c74f2ce682205dededc8e0: waiting        |--------------------------------------|
            elapsed: 0.1 s                                                                 total:   0.0 B (0.0 B/s)

            Stderr:   time="2024-10-07T16:45:51-07:00" level=error msg="Ensure all 127.0.0.1:5000/testpush-plain-http-with-localhost-0bf13f12:3.13-org"
            time="2024-10-07T16:45:51-07:00" level=error msg="name 127.0.0.1:5000/testpush-plain-http-with-localhost-0bf13f12:3.13-org"
            time="2024-10-07T16:45:51-07:00" level=error msg="target {application/vnd.docker.distribution.manifest.list.v2+json sha256:ec14c7992a97fc11425907e908340c6c3d6ff602f5f13d899e6b7027c9b4133a %!s(int64=1638) [] map[]  %!s(*v1.Platform=<nil>) }"
            time="2024-10-07T16:45:51-07:00" level=error msg="plt %!s(*v1.Platform=<nil>)"
            time="2024-10-07T16:45:51-07:00" level=error msg="Ensuring {arm64 linux  [] }"
            time="2024-10-07T16:45:51-07:00" level=error msg="done %!s(*v1.Platform=<nil>)"
            time="2024-10-07T16:45:51-07:00" level=error msg="Ensure all 127.0.0.1:5000/testpush-plain-http-with-localhost-0bf13f12:3.13-org-tmp-reduced-platform"
            time="2024-10-07T16:45:51-07:00" level=error msg="name 127.0.0.1:5000/testpush-plain-http-with-localhost-0bf13f12:3.13-org-tmp-reduced-platform"
            time="2024-10-07T16:45:51-07:00" level=error msg="target {application/vnd.docker.distribution.manifest.list.v2+json sha256:d13219399e61ee5d3c2b411e758d38cf1e1fef0185c74f2ce682205dededc8e0 %!s(int64=332) [] map[]  %!s(*v1.Platform=<nil>) }"
            time="2024-10-07T16:45:51-07:00" level=error msg="plt %!s(*v1.Platform=<nil>)"
            time="2024-10-07T16:45:51-07:00" level=error msg="done %!s(*v1.Platform=<nil>)"
            time="2024-10-07T16:45:51-07:00" level=info msg="pushing as a reduced-platform image (127.0.0.1:5000/testpush-plain-http-with-localhost-0bf13f12:3.13-org-tmp-reduced-platform, %!s(*v1.Platform=<nil>), application/vnd.docker.distribution.manifest.list.v2+json, sha256:d13219399e61ee5d3c2b411e758d38cf1e1fef0185c74f2ce682205dededc8e0)"
            time="2024-10-07T16:45:51-07:00" level=fatal msg="content digest sha256:d13219399e61ee5d3c2b411e758d38cf1e1fef0185c74f2ce682205dededc8e0: not found"

            Env:
            SHELL=/bin/bash
            LOGNAME=dmp
            XDG_SESSION_TYPE=tty
            HOME=/home/dmp.linux
            LANG=C.UTF-8
            SSH_CONNECTION=192.168.5.2 51297 192.168.5.15 22
            LESSCLOSE=/usr/bin/lesspipe %s %s
            XDG_SESSION_CLASS=user
            TERM=xterm-256color
            LESSOPEN=| /usr/bin/lesspipe %s
            USER=dmp
            SHLVL=2
            XDG_SESSION_ID=2
            XDG_RUNTIME_DIR=/run/user/501
            SSH_CLIENT=192.168.5.2 51297 22
            XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop
            DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/501/bus
            SSH_TTY=/dev/pts/1
            OLDPWD=/Users/dmp
            _=/usr/local/go/bin/go
            PATH=/usr/local/go/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/sbin:/sbin:/usr/sbin:/sbin:/usr/sbin:/sbin:/usr/local/go/bin:/usr/sbin:/sbin:/usr/local/go/bin:/home/dmp.linux/go/bin
            PWD=/Users/dmp/Projects/go/nerd/nerdctl/cmd/nerdctl/image
            DOCKER_CONFIG=/tmp/TestPushplain_http_with_localhost767453805/001
            NERDCTL_TOML=/tmp/TestPushplain_http_with_localhost767453805/001/nerdctl.toml
        case.go:164: ======================== Post-test cleanup ========================
apostasie commented 2 weeks ago

Here is a reproducer.

Be sure to replace amd64 with whatever is NOT your native platform.

#!/usr/bin/env bash

platform="${1:-linux/amd64}"
nerdctl rm -f $(nerdctl ps -aq) >/dev/null 2>&1
nerdctl rmi -f $(nerdctl images -q) >/dev/null 2>&1
nerdctl image pull ubuntu --platform "$platform" >/dev/null 2>&1
nerdctl run -d ubuntu >/dev/null 2>&1
nerdctl image rm -f ubuntu >/dev/null 2>&1

echo "Pulling again"
nerdctl pull ubuntu --platform "$platform" 2>&1 >/dev/null
echo "Tagging"
nerdctl tag ubuntu myregistry:5000/test
echo "Pushing - should fail to contact the listed registry - is failing with digest not found"
nerdctl push myregistry:5000/test
apostasie commented 2 weeks ago

@lingdie I am back with this %&$!@#&****

I opened a bunch of new tickets. Previous fix did correctly address the issue for a bunch of cases - but there are clearly more (even something as simple as nerdctl images --filter will break damn it). It looks like anything trying to ReadBlob, or touching containerd convert.Convert, or multi-platform images will also break.

This one with the temp-reduced platform is especially baffling.

Either there is something really bizarre going on in containerd, or nobody here actually understands how we should manipulate images and our implementation is totally wrong.

If you have any insight for me, hit me up - this is driving me bonkers.

lingdie commented 2 weeks ago

I believe we've done all we can at the nerdctl level. A bug in containerd is causing nerdctl to fail in pulling certain layers when calling image pull. At the nerdctl level, given that we can only call the containerd SDK, we are unable to fix these issues.

apostasie commented 2 weeks ago

(╯°□°)╯︵ ┻━┻