dragonflyoss / nydus

Nydus - the Dragonfly image service, providing fast, secure and easy access to container images.
https://nydus.dev/
Apache License 2.0
1.17k stars 200 forks source link

When using kubernetes and nydus,container failed to start with an error “target snapshot already exists” #1527

Open cl0udee opened 8 months ago

cl0udee commented 8 months ago

Additional Information

The following information is very important in order to help us to help you. Omission of the following details may delay your support request or receive no attention at all.

Version of nydus being used (nydusd --version)

Version:    v2.1.1
Git Commit:     2fd7070bf7c08ba8667a375ecf5ab4ca3963a184
Build Time:     2022-11-06T11:14:20.450697142Z
Profile:    release
Rustc:      rustc 1.61.0 (fe5b13d68 2022-05-18)

Version of nydus-snapshotter being used (containerd-nydus-grpc --version)

Version:     v0.4.0
Revision:    1e18acbf9d39588d39d0276a423e33ebeeb3462b
Go version:  go1.18.6
Build time:  2022-11-30T11:40:06

Kernel information (uname -r)

4.19.90-2102.2.0.0062.ctl2.x86_64

GNU/Linux Distribution, if applicable (cat /etc/os-release)

command result: cat /etc/os-release

containerd-nydus-grpc command line used, if applicable (ps aux | grep containerd-nydus-grpc)

/usr/bin/containerd-nydus-grpc --config-path /etc/nydus/nydusd-config.json --address /run/containerd/containerd-nydus-grpc.sock --nydusd-path /usr/bin/nydusd --nydusimg-path /usr/bin/nydus-image --log-to-stdout

client command line used, if applicable (such as: nerdctl, docker, kubectl, ctr)

kubectl apply -f test-pod.yaml

Screenshots (if applicable)

Details about issue

When I use nerdctl, such as

nerdctl --snapshotter nydus run --rm -it centos:v1 bash

the container can be successfully created and run normally. However, when I switch to Kubernetes, it gives me the following error:

Warning  FailedCreatePodSandBox  <invalid>  kubelet  Failed to create pod sandbox: rpc error: code = AlreadyExists desc = failed to get sandbox image "xxx/pause:3.6": failed to pull image "xxx/pause:3.6": failed to pull and unpack image "xxx/pause:3.6": unable to prepare extraction snapshot: target snapshot "sha256:xxx": already exists

I suspect this issue is related to the pause image because nerdctl, which doesn't involve the pause image, can start successfully. I have also tried using

ctr -n k8s.io content fetch $pause-image-name

but it doesn't work.

This is very confusing for me, especially because I was able to successfully launch pods using Kubernetes a while ago. However, after some time has passed, it is no longer able to start.

imeoer commented 8 months ago

Cloud you try ctr images delete xxx/pause:3.6 --sync ?

imeoer commented 8 months ago

Cloud you try ctr images delete xxx/pause:3.6 --sync ?

Need to ensure the pod using the image has been deleted first.

cl0udee commented 8 months ago

Cloud you try ctr images delete xxx/pause:3.6 --sync ?

Need to ensure the pod using the image has been deleted first.

I have already deleted all the pause images, but I still receive the same error. target snapshot "sha256:xxx": already exists

kinderyj commented 8 months ago

Is this a work in progress (WIP)? I'm experiencing the same issue in kata 3.2.0. @imeoer

imeoer commented 2 months ago

This appears to be an inconsistency in containerd snapshot metadata, try the following commands:

ctr -n k8s.io content ls | grep sha256:xxx ctr -n k8s.io content rm $blob_id

But we still don't have an way to reproduce it.