containerd / nydus-snapshotter

A containerd snapshotter with data deduplication and lazy loading in P2P fashion
https://nydus.dev/
Apache License 2.0
156 stars 88 forks source link

Pod panic due to invalid argument when using rafs version6 #594

Open zewenying opened 1 month ago

zewenying commented 1 month ago

Problem Description

Hi, I meet a problem when using Nydus format imgae to create a Pod. The problem is Pod will panic with the following logs. Pod logs

panic: read /usr/share/mime/globs2: invalid argument

goroutine 1 [running]:
mime.loadMimeGlobsFile({0x1fbb3db?, 0x384a920?})
    GOROOT/src/mime/type_unix.go:74 +0x265
mime.initMimeUnix()
    GOROOT/src/mime/type_unix.go:107 +0x4e
mime.initMime()
    GOROOT/src/mime/type.go:88 +0x3d
sync.(*Once).doSlow(0x13?, 0x1cb68a0?)
    GOROOT/src/sync/once.go:74 +0xc2
sync.(*Once).Do(...)
    GOROOT/src/sync/once.go:65
mime.AddExtensionType({0x1f966c5, 0x5}, {0x1faea9b, 0x10})
    GOROOT/src/mime/type.go:171 +0x65
k8s.io/kube-openapi/pkg/handler3.init.0()
    external/io_k8s_kube_openapi/pkg/handler3/handler.go:88 +0x2b

Nydus-snapshotter logs: error: failed to get chunk information

time="2024-05-10T04:11:47.958422844Z" level=debug msg="[Prepare] snapshot with key k8s.io/455/333e128284a64a49e6c59759f344d9a01758ce4a93e8df21133c0523d3dbdc9f, parent k8s.io/2/sha256:e17133b79956ad6f69ae7f775badd1c11bad2fc64f0529cab863b9d12fbaa5c4"
time="2024-05-10T04:11:47.959171647Z" level=debug msg="[Prepare] snapshot with labels map[]" key=k8s.io/455/333e128284a64a49e6c59759f344d9a01758ce4a93e8df21133c0523d3dbdc9f parent="k8s.io/2/sha256:e17133b79956ad6f69ae7f775badd1c11bad2fc64f0529cab863b9d12fbaa5c4"
time="2024-05-10T04:11:47.959217839Z" level=debug msg="continue to check snapshot 450 parent"
time="2024-05-10T04:11:47.959244457Z" level=debug msg="continue to check snapshot 1 parent"
time="2024-05-10T04:11:47.959398559Z" level=debug msg="overlayfs mount options [workdir=/var/lib/containerd-nydus/snapshots/450/work upperdir=/var/lib/containerd-nydus/snapshots/450/fs lowerdir=/var/lib/containerd-nydus/snapshots/1/fs]"
time="2024-05-10T04:12:03.730236689Z" level=debug msg="[Mounts] snapshot k8s.io/455/333e128284a64a49e6c59759f344d9a01758ce4a93e8df21133c0523d3dbdc9f"
time="2024-05-10T04:12:03.730282359Z" level=info msg="[Mounts] snapshot k8s.io/455/333e128284a64a49e6c59759f344d9a01758ce4a93e8df21133c0523d3dbdc9f ID 450 Kind Active"
time="2024-05-10T04:12:03.730359974Z" level=debug msg="overlayfs mount options [workdir=/var/lib/containerd-nydus/snapshots/450/work upperdir=/var/lib/containerd-nydus/snapshots/450/fs lowerdir=/var/lib/containerd-nydus/snapshots/1/fs]"
time="2024-05-10T04:12:04.392817303Z" level=debug msg="[Prepare] snapshot with key k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541, parent k8s.io/424/sha256:c6a2acc64ef312c34cdb430df67018c8b4d9a7603c164687dd46b8add268125f"
time="2024-05-10T04:12:04.393599760Z" level=debug msg="[Prepare] snapshot with labels map[]" key=k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 parent="k8s.io/424/sha256:c6a2acc64ef312c34cdb430df67018c8b4d9a7603c164687dd46b8add268125f"
time="2024-05-10T04:12:04.393649764Z" level=debug msg="continue to check snapshot 451 parent"
time="2024-05-10T04:12:04.393696728Z" level=info msg="Prepares active snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541, nydusd should start afterwards" key=k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 parent="k8s.io/424/sha256:c6a2acc64ef312c34cdb430df67018c8b4d9a7603c164687dd46b8add268125f"
time="2024-05-10T04:12:04.393709642Z" level=debug msg="Found nydus meta layer id 419" key=k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 parent="k8s.io/424/sha256:c6a2acc64ef312c34cdb430df67018c8b4d9a7603c164687dd46b8add268125f"
time="2024-05-10T04:12:04.393722593Z" level=debug msg="Nydus remote snapshot 419 is ready"
time="2024-05-10T04:12:04.393728039Z" level=info msg="Nydus remote snapshot 419 is ready"
time="2024-05-10T04:12:04.393741228Z" level=info msg="remote mount options [workdir=/var/lib/containerd-nydus/snapshots/451/work upperdir=/var/lib/containerd-nydus/snapshots/451/fs lowerdir=/var/lib/containerd-nydus/snapshots/419/mnt]"
time="2024-05-10T04:12:04.394829703Z" level=debug msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541"
time="2024-05-10T04:12:04.394853820Z" level=info msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 ID 451 Kind Active"
time="2024-05-10T04:12:04.394872248Z" level=debug msg="Nydus remote snapshot 419 is ready"
time="2024-05-10T04:12:04.394929091Z" level=info msg="remote mount options [workdir=/var/lib/containerd-nydus/snapshots/451/work upperdir=/var/lib/containerd-nydus/snapshots/451/fs lowerdir=/var/lib/containerd-nydus/snapshots/419/mnt]"
time="2024-05-10T04:12:04.407118785Z" level=debug msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541"
time="2024-05-10T04:12:04.407196424Z" level=info msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 ID 451 Kind Active"
time="2024-05-10T04:12:04.407235229Z" level=debug msg="Nydus remote snapshot 419 is ready"
time="2024-05-10T04:12:04.407334540Z" level=info msg="remote mount options [workdir=/var/lib/containerd-nydus/snapshots/451/work upperdir=/var/lib/containerd-nydus/snapshots/451/fs lowerdir=/var/lib/containerd-nydus/snapshots/419/mnt]"
time="2024-05-10T04:12:04.409115598Z" level=debug msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541"
time="2024-05-10T04:12:04.409154210Z" level=info msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 ID 451 Kind Active"
time="2024-05-10T04:12:04.409175676Z" level=debug msg="Nydus remote snapshot 419 is ready"
time="2024-05-10T04:12:04.409232543Z" level=info msg="remote mount options [workdir=/var/lib/containerd-nydus/snapshots/451/work upperdir=/var/lib/containerd-nydus/snapshots/451/fs lowerdir=/var/lib/containerd-nydus/snapshots/419/mnt]"
time="2024-05-10T04:12:04.435037247Z" level=debug msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541"
time="2024-05-10T04:12:04.435096518Z" level=info msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 ID 451 Kind Active"
time="2024-05-10T04:12:04.435153471Z" level=debug msg="Nydus remote snapshot 419 is ready"
time="2024-05-10T04:12:04.435290250Z" level=info msg="remote mount options [workdir=/var/lib/containerd-nydus/snapshots/451/work upperdir=/var/lib/containerd-nydus/snapshots/451/fs lowerdir=/var/lib/containerd-nydus/snapshots/419/mnt]"
[2024-05-10 04:12:04.594260 +00:00] ERROR [/src/error.rs:22] Error:
    "failed to get chunk information"
    at rafs/src/metadata/direct_v6.rs:752
    note: enable `RUST_BACKTRACE=1` env to display a backtrace
time="2024-05-10T04:13:03.588204445Z" level=debug msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541"
time="2024-05-10T04:13:03.588756976Z" level=info msg="[Mounts] snapshot k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 ID 451 Kind Active"
time="2024-05-10T04:13:03.588824124Z" level=debug msg="Nydus remote snapshot 419 is ready"
time="2024-05-10T04:13:03.588967684Z" level=info msg="remote mount options [workdir=/var/lib/containerd-nydus/snapshots/451/work upperdir=/var/lib/containerd-nydus/snapshots/451/fs lowerdir=/var/lib/containerd-nydus/snapshots/419/mnt]"
time="2024-05-10T04:13:04.604445896Z" level=debug msg="[Remove] snapshot with key k8s.io/455/333e128284a64a49e6c59759f344d9a01758ce4a93e8df21133c0523d3dbdc9f snapshot id 450"
time="2024-05-10T04:13:04.605409593Z" level=debug msg="[Remove] snapshot with key k8s.io/456/48976ac68e1758545da3b320cc56abb09c4908f789de8a34785d06cfddf5e541 snapshot id 451"
time="2024-05-10T04:13:04.606052374Z" level=debug msg="[Cleanup] snapshots"
time="2024-05-10T04:13:04.606533640Z" level=info msg="[Cleanup] orphan directories [/var/lib/containerd-nydus/snapshots/450 /var/lib/containerd-nydus/snapshots/451]"
time="2024-05-10T04:13:04.606557097Z" level=debug msg="no RAFS filesystem instance associated with snapshot 450"
time="2024-05-10T04:13:04.607088316Z" level=debug msg="no RAFS filesystem instance associated with snapshot 451"

Version Information

nydus-snapshotter v0.10.0 nydusify

Nydusify version
Version : v2.2.2
Revision    : 19d5b12bb0ca58d0474861416e91961169235114
Go version  : go1.18.10
Build time  : 2023-07-17T04:03:45

nydus-image

Version:    v2.2.2
Git Commit:     19d5b12bb0ca58d0474861416e91961169235114
Build Time:     2023-07-17T04:12:08.369576349Z
Profile:    release
Rustc:      rustc 1.66.1 (90743e729 2023-01-10)

nydusd

Version:    v2.2.2
Git Commit:     19d5b12bb0ca58d0474861416e91961169235114
Build Time:     2023-07-17T04:12:08.369576349Z
Profile:    release
Rustc:      rustc 1.66.1 (90743e729 2023-01-10)

Workaround

Sol1: change rafs version from 6 to 5

nydusify(v2.2.2) convert uses version6 as nydus image format, set to 5, then panic will not occur. nydusify convert --fs-version 5 --source $sourceImage --target $targetImage

Sol2: remove nydus image on machine and recreate the Pod

Just remove the image, such like crictl rmi {image}

Question

Apart from the mentioned two solutions, is there any other solution to solve the problem? The ideal solution is every Pod using a fixed version nydus image can run normally on the machine. Is it possible to do some code patches in nydusify(v2.2.2) to implement the ideal solution?

Other Useful Information

Some other nydus images which are converted with fs-version6 can run normally on the same machine. At the same time, Pod using the mentioned image can run normally on other machines.

imeoer commented 1 month ago

@zewenying Thanks for the details, It seems be related to the fs version v6 format, could you describe the image content a little more?

zewenying commented 1 month ago

@zewenying Thanks for the details, It seems be related to the fs version v6 format, could you describe the image content a little more?

Thanks for your reply. Could you please give me some tools to describe the image content? Because I don't know what kind of image content will help you to find out the problem.

imeoer commented 1 month ago

@zewenying Try to validate your rafs v6 image by nydusify check --source $oci_image --target $nydus_image first (requires nydusify, nydus-image, nydusd are installed on your node). :)

zewenying commented 1 month ago

@zewenying Try to validate your rafs v6 image by nydusify check --source $oci_image --target $nydus_image first (requires nydusify, nydus-image, nydusd are installed on your node). :)

Hi, here is the log. And I try to setRUST_BACKTRACE=1 to get more logs, but there are no more logs. image

Tips:

  1. I use an another nydus image which has the same problem to this one. But it passes the check. So it seems like that the registry of source image works well.
    INFO[2024-05-13T12:00:05+08:00] Verifying filesystem for source and Nydus image
    INFO[2024-05-13T12:01:03+08:00] Verified Nydus image $TARGETIMAGE
imeoer commented 1 month ago

@zewenying It looks like rafs v6 has some fs issues, does rafs v5 always works on the nydusify check?

zewenying commented 1 month ago

@zewenying It looks like rafs v6 has some fs issues, does rafs v5 always works on the nydusify check?

No. It does not pass the check with the same error.

imeoer commented 1 month ago

@zewenying Sorry for the misunderstand, It appears that the TimeOut keyword in your log, maybe it's a registry/network issue.

zewenying commented 1 month ago

@zewenying Sorry for the misunderstand, It appears that the TimeOut keyword in your log, maybe it's a registry/network issue.

I have tried another problem image which passed the check three days ago and it fails today. It seems like the registry does not work well.

zewenying commented 1 month ago

hi, @imeoer , I will leave office tomorrow. There will be an another colleague to follow this issue.