dragonflyoss / nydus

Nydus - the Dragonfly image service, providing fast, secure and easy access to container images.
https://nydus.dev/
Apache License 2.0
1.23k stars 205 forks source link

api.sock: not found #1600

Closed jonoirwinrsa closed 4 months ago

jonoirwinrsa commented 4 months ago

Hi all

I'm getting the following error when running Nydus. It works for some time and then fails. Any help debugging would be appreciated.

Pod event error:

Error: failed to create containerd container: wait until daemon is RUNNING: get daemon state: daemon socket /var/lib/containerd-nydus/socket/cq81pj7bnr6vri01jgig/api.sock: not found

Snapshotter service status

● nydus-snapshotter.service - nydus snapshotter
   Loaded: loaded (/etc/systemd/system/nydus-snapshotter.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2024-07-10 22:42:10 UTC; 19h ago
 Main PID: 16006 (containerd-nydu)
   CGroup: /system.slice/nydus-snapshotter.service
           ├─  16006 /usr/local/bin/containerd-nydus-grpc --config /etc/nydus/config.toml
           ├─1117361 /usr/local/bin/nydusd fuse --thread-num 38 --config /var/lib/containerd-nydus/config/cq805vv157qfq2jtaieg/config.json --bootstrap /var/lib/containerd-ny...
           ├─1216142 /usr/local/bin/nydusd fuse --thread-num 38 --config /var/lib/containerd-nydus/config/cq81mln157qfq2jtaif0/config.json --bootstrap /var/lib/containerd-ny...
           ├─1216903 /usr/local/bin/nydusd fuse --thread-num 38 --config /var/lib/containerd-nydus/config/cq81mp7157qfq2jtaifg/config.json --bootstrap /var/lib/containerd-ny...
           ├─1218879 /usr/local/bin/nydusd fuse --thread-num 38 --config /var/lib/containerd-nydus/config/cq81n4v157qfq2jtaig0/config.json --bootstrap /var/lib/containerd-ny...
           └─1219302 /usr/local/bin/nydusd fuse --thread-num 38 --config /var/lib/containerd-nydus/config/cq81n7f157qfq2jtaigg/config.json --bootstrap /var/lib/containerd-ny...

Jul 10 22:42:10 ip-10-0-2-156.ec2.internal systemd[1]: Started nydus snapshotter.
Jul 11 17:53:08 ip-10-0-2-156.ec2.internal containerd-nydus-grpc[16006]: Error: Custom { kind: Other, error: "" }
Jul 11 18:08:09 ip-10-0-2-156.ec2.internal containerd-nydus-grpc[16006]: Error: Custom { kind: Other, error: "" }

In the containerd-nydus logs I see

time="2024-07-11T18:04:26.812068159Z" level=error msg="Process 1234601 has been a zombie"
time="2024-07-11T18:04:26.977522651Z" level=error msg="Process 1234601 has been a zombie"
time="2024-07-11T18:04:27.101894185Z" level=error msg="Process 1234601 has been a zombie"
time="2024-07-11T18:04:27.278224861Z" level=error msg="Process 1234601 has been a zombie"
time="2024-07-11T18:04:27.407554869Z" level=error msg="Process 1234601 has been a zombie"
time="2024-07-11T18:04:27.508847801Z" level=error msg="Process 1234601 has been a zombie"
time="2024-07-11T18:04:27.681321785Z" level=error msg="Process 1234601 has been a zombie"

and

time="2024-07-11T17:56:33.868050605Z" level=error msg="Nydusd cq81pj7bnr6vri01jgig probably not started"

Additional Information

Version of nydus being used (nydusd --version)

Version:        v2.2.5
Git Commit:     2225560e3e7d47eb8a82406e39dcda2bbab279ff
Build Time:     2024-06-10T18:24:59.840014736Z
Profile:        release
Rustc:          rustc 1.72.1 (d5c2e9c34 2023-09-13)

Version of nydus-snapshotter being used (containerd-nydus-grpc --version)

Version:     v0.13.13
Revision:    e9d1bb738f778a2e30b8284ca4f479fa2517456c
Go version:  go1.19.6
Build time:  2024-05-15T03:57:13

Kernel information (uname -r)

5.10.219-208.866.amzn2.x86_64

GNU/Linux Distribution, if applicable (cat /etc/os-release)

NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"

containerd-nydus-grpc command line used, if applicable (ps aux | grep containerd-nydus-grpc)

/usr/local/bin/containerd-nydus-grpc --config /etc/nydus/config.toml
imeoer commented 4 months ago

It seems some nydusd processes have unexpectedly exited, can we find the nydusd's command line and run it by manual? Have any other errors before Error: Custom { kind: Other, error: "" }?

imeoer commented 4 months ago

Maybe it is related with some special images? Let's find the /var/lib/containerd-nydus/config/xxx/config.json file to get the image repo.

jonoirwinrsa commented 4 months ago

Thanks @imeoer for the help! I think the error comes from having chunk deduplication enabled (based off https://github.com/dragonflyoss/nydus/pull/1507). We're testing to confirm this is the case.