containerd / runwasi

Facilitates running Wasm / WASI workloads managed by containerd
Apache License 2.0
1.05k stars 86 forks source link

Shim cannot connect to runtime daemon? #167

Open jglogan opened 1 year ago

jglogan commented 1 year ago

Hi, I'm playing with runwasi in kind by adapting the integration test Dockerfile. I see that the wasmtime shim works for running the docker.io/wasmedge/example-wasi:latest test image, but I cannot run the same workload when using a node image that configures daemon mode. Is there something else that I need to do to get daemon mode working?

Here's the error I see (both wasmedge and wasmtime fail in the same way):

Events:
  Type     Reason                  Age   From               Message
  ----     ------                  ----  ----               -------
  Normal   Scheduled               15s   default-scheduler  Successfully assigned default/wasi-job-demo-wm4cj to kind-worker
  Warning  FailedCreatePodSandBox  14s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to start shim: start failed: containerd-shim-wasmedged-v1: Ttrpc(RpcStatus(Status { code: NOT_FOUND, message: "/runwasi.services.sandbox.v1.Manager/Connect is not supported", details: [], special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }))
: exit status 1: unknown

I configured the daemon as a part of the containerd systemd service and do see that it is running, and the unix socket is present as well:

root@kind-worker:/# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 20:22 ?        00:00:00 /sbin/init
root          79       1  0 20:22 ?        00:00:00 /lib/systemd/systemd-journald
message+      90       1  0 20:22 ?        00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
root         113       1  0 20:22 ?        00:00:00 /usr/local/bin/containerd-wasmedged
root         117       1  1 20:22 ?        00:00:05 /usr/local/bin/containerd
root         201       1  1 20:23 ?        00:00:06 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///run/containerd
root         254       1  0 20:23 ?        00:00:00 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -id a63b62567b06b0cd4d17f8c3ba7b870bb9f98d86df803216f26a9df57c88a327 -address /run/containerd/containerd.sock
root         255       1  0 20:23 ?        00:00:00 /usr/local/bin/containerd-shim-runc-v2 -namespace k8s.io -id 10ae9a17d1bbe7a0098adb1e27fc296cfe0eaafacf26ba83fc71472aad92cef0 -address /run/containerd/containerd.sock
65535        295     255  0 20:23 ?        00:00:00 /pause
65535        297     254  0 20:23 ?        00:00:00 /pause
root         362     255  0 20:23 ?        00:00:00 /bin/kindnetd
root         387     254  0 20:23 ?        00:00:00 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=kind-worker
root@kind-worker:/# ls -l /var/run/io.containerd.wasmwasi.v1 
total 0
srwxr-xr-x 1 root root 0 Jul  4 20:22 manager.sock

journalctl -u wasmedged.service shows nothing interesting.

containerd config:

root@kind-worker:/# more /etc/containerd/config.toml 
version = 2

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    restrict_oom_score_adj = false
    sandbox_image = "registry.k8s.io/pause:3.7"
    tolerate_missing_hugepages_controller = true
    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "runc"
      discard_unpacked_layers = true
      snapshotter = "overlayfs"
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          base_runtime_spec = "/etc/containerd/cri-base.json"
          runtime_type = "io.containerd.runc.v2"
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            SystemdCgroup = true
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.test-handler]
          base_runtime_spec = "/etc/containerd/cri-base.json"
          runtime_type = "io.containerd.runc.v2"
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.test-handler.options]
            SystemdCgroup = true
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.wasm]
          runtime_type = "io.containerd.wasmedged.v1"
    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "/etc/containerd/certs.d"

[proxy_plugins]
  [proxy_plugins.fuse-overlayfs]
    address = "/run/containerd-fuse-overlayfs.sock"
    type = "snapshot"
cuisongliu commented 10 months ago
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: wasm
handler: wasm

apply this config runtime for k8s .

using this runtime for your pod.

jglogan commented 10 months ago

this is what I am using and it doesn't work for me with wasmtimed. It works ok for wasmtime.

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: wasm
  labels:
    app: wasi-job-demo
handler: wasm
---
apiVersion: batch/v1
kind: Job
metadata:
  name: wasi-job-demo
spec:
  template:
    spec:
      runtimeClassName: wasm
      restartPolicy: Never
      containers:
      - name: wasi-job-demo
        image: docker.io/wasmedge/example-wasi:latest

I've rebuild my image using the runwasi repo head and this is what I see for in the pod events now:

Events:
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Normal   Scheduled               26s                default-scheduler  Successfully assigned default/wasi-job-demo-mrw25 to kind-worker
  Warning  FailedCreatePodSandBox  12s (x2 over 24s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to start shim: start failed: containerd-shim-wasmtimed-v1: Ttrpc(RpcStatus(Status { code: NOT_FOUND, message: "/runwasi.services.sandbox.v1.Manager/Connect is not supported", details: [], special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }))
cuisongliu commented 10 months ago

How did you install it? It looks like there are some issues with the installation.

wget https://github.com/containerd/runwasi/releases/download/containerd-shim-wasmedge/v0.3.0/containerd-shim-wasmedge-x86_64.tar.gz
tar -zxvf containerd-shim-wasmedge-x86_64.tar.gz -C /opt/containerd/bin/
cat <<EOF >> /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.wasm]
runtime_type = "io.containerd.wasmedge.v1"
EOF
Mossaka commented 10 months ago

wasmedged is definitely a weak point where we didn't have extensive tests, so it could be broken.

Is there something else that I need to do to get daemon mode working?

Unfortunately at the moment I didn't have much ideas on this. Will take a look shortly.

jglogan commented 9 months ago

How did you install it? It looks like there are some issues with the installation.

When I was playing with it back in July I set up the attached Dockerfile to build everything. I rebuilt images for wasmtime, wasmedge, wasmtimed, and wasmedged just now, and what I see when I submit a docker.io/wasmedge/example-wasi job to a kind cluster for each is:

So there's progress in that wasmedge works where it wasn't for me before!

As before, I see that the wasm runtime daemon is running, and /var/run/io.containerd.wasmwasi.v1/manager.sock is present, but for some reason communication through that socked isn't working.

# systemctl list-units --type=service --state=running 
  UNIT                     LOAD   ACTIVE SUB     DESCRIPTION
  containerd.service       loaded active running containerd container runtime
  dbus.service             loaded active running D-Bus System Message Bus
  kubelet.service          loaded active running kubelet: The Kubernetes Node Agent
  systemd-journald.service loaded active running Journal Service
  wasmedged.service        loaded active running wasmedged: runwasi daemon

When I look at the containerd journal in the worker node, I see:

Nov 14 15:58:45 kind-worker containerd[117]: time="2023-11-14T15:58:45.738464661Z" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:wasi-job-demo-fndd2,Uid:013d5a36-3359-4707-892f-5cc9810ca341,Namespace:default,Attempt:0,}"
Nov 14 15:58:47 kind-worker containerd[117]: time="2023-11-14T15:58:47.359606419Z" level=error msg="copy shim log" error="read /proc/self/fd/42: file already closed"
Nov 14 15:58:47 kind-worker containerd[117]: time="2023-11-14T15:58:47.596678497Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:wasi-job-demo-fndd2,Uid:013d5a36-3359-4707-892f-5cc9810ca341,Namespace:default,Attempt:0,} failed, error" error="failed to create containerd task: failed to start shim: start failed: containerd-shim-wasmedged-v1: Ttrpc(RpcStatus(Status { code: NOT_FOUND, message: \"/runwasi.services.sandbox.v1.Manager/Connect is not supported\", details: [], special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }))\n: exit status 1: unknown"

Nothing else in the journal looks different from that for a typical kind cluster.

docker-build.zip

jprendes commented 9 months ago

This is something we don't currently test. It wouldn't surprise me if it is broken.

I can try debug it tomorrow.

jglogan commented 9 months ago

Cool. LMK if there's anything you'd like me to check out on my system.

Just using this to build the image:

docker buildx build --platform linux/amd64,linux/arm64 --build-arg SHIM=${shim} --ssh default --push -t "${tag}" -f docker/Dockerfile .

Then, just using that image when creating a kind cluster.

jglogan commented 8 months ago

@jprendes just checking in on this, any thoughts on why connecting to the shared-mode daemon doesn't work?

erkules commented 5 months ago

An update "shared-mode" still not working.

  Warning  FailedCreatePodSandBox  12s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to start shim: start failed: containerd-shim-wasmtimed-v1: Ttrpc(Nix(ENOENT))

This could be an killer-feature.

macko99 commented 2 months ago

@jprendes any update on shared-mode?

jprendes commented 2 months ago

There hasn't been any progress on this front. The first 2 steps would be:

@macko99 @erkules would you be interested in contributing?

devigned commented 2 months ago

See also: https://github.com/containerd/runwasi/issues/218#issuecomment-2220523767