containerd / runwasi

Facilitates running Wasm / WASI workloads managed by containerd
Apache License 2.0
1.09k stars 90 forks source link

containerd shim wasmtime issue #722

Open matsbror opened 1 week ago

matsbror commented 1 week ago

I am trying to set up a multi-architecture system for container execution with both amd64, arm64 and riscv64 nodes. For the moment I am trying to make sure I can execute both native and WebAssembly containers, the latter using the wasmtime shim from runwasi.

Here are two containers, one native and one wasm:

mats@k3s-x86-1:~/wasm$ sudo ctr image ls
ERRO[0000] failed calculating size for image ttl.sh/fd-wasm:48h  error="no match for platform in manifest: not found"
REF                  TYPE                                    DIGEST                                                                  SIZE    PLATFORMS               LABELS
ttl.sh/fd-native:48h application/vnd.oci.image.index.v1+json sha256:40a7a7a643be2dbca68197d3bb380ce59f12ffa315eab8bc10b4320c469eef53 1.6 MiB linux/amd64,linux/arm64 -     
ttl.sh/fd-wasm:48h   application/vnd.oci.image.index.v1+json sha256:7371a6e30d195a69fdaf209e6ae10bad6d7a7a9026d0ee392e8aa7a65f682998 854.0 B linux/wasm              -   

I can run the native container fine on both amd64 and arm64, but I get a problem when trying to run the wasm container with containerd:

$ sudo ctr  run --rm --runtime io.containerd.wasmtime.v1 --platform linux/wasm ttl.sh/fd-wasm:48h ctr1
INFO[0000] apply failure, attempting cleanup             error="failed to extract layer sha256:22e2d605d8e21fcc75b332066254b97f543a33d2acfa44ab668627e23c36c63b: failed to get reader from content store: content digest sha256:82cc3c1443ff0472266de1f56b501b8dba520dd9bc7b03b49022a128294f0ed9: not found" key="extract-892851035-QSbS sha256:22e2d605d8e21fcc75b332066254b97f543a33d2acfa44ab668627e23c36c63b"
ctr: apply layer error for "ttl.sh/fd-wasm:48h": failed to extract layer sha256:22e2d605d8e21fcc75b332066254b97f543a33d2acfa44ab668627e23c36c63b: failed to get reader from content store: content digest sha256:82cc3c1443ff0472266de1f56b501b8dba520dd9bc7b03b49022a128294f0ed9: not found

I get exactly the same error on amd64 and arm64, and just tested on riscv64 with the same error..

Running the same container using docker (and the wasmtime containerd shim) works fine on arm64, but on the adm64 node I get:

$ docker run --rm --runtime io.containerd.wasmtime.v1 --platform linux/wasm ttl.sh/fd-wasm:48h
docker: Error response from daemon: failed to create task for container: failed to start shim: failed to resolve runtime path: runtime "io.containerd.wasmtime.v1" binary not installed "containerd-shim-wasmtime-v1": file does not exist: unknown.

The shim is installed and configured in /etc/containerd/config.toml

Any help is greatly appreciated.

cpuguy83 commented 1 week ago

You probably need to tell ctr image pull to pull the specific platform or use the --all-platforms flag. Otherwise the content will be gc'd.

jprendes commented 1 week ago

As for the issue when using docker, it looks like the shim binary (containerd-shim-wasmtime-v1) is not visible (i.e., in the PATH) to the containerd used by docker.

matsbror commented 1 week ago

@cpuguy83 that indeed resolved the issue. Can you point me in the direction as to gain an understanding of what is happening here and why this is the case? There is only one platform in the image so why is there a difference? Where does the gc kick in?

I am not very happy that the pull time imcreases:

ubuntu@vf2:~$ time sudo ctr  image pull ttl.sh/fd-wasm:48h
ttl.sh/fd wasm:48h                              saved
└──index (7371a6e30d19)                         complete        |++++++++++++++++++++++++++++++++++++++|
   ├──manifest (6336db3c947d)                   complete        |++++++++++++++++++++++++++++++++++++++|
   │  └──config (bc9c2208b4fa)                  waiting         |--------------------------------------|
   └──manifest (768d2baec79d)                   complete        |++++++++++++++++++++++++++++++++++++++|
      └──config (912b4a2c9134)                  complete        |++++++++++++++++++++++++++++++++++++++|
application/vnd.oci.image.index.v1+json sha256:7371a6e30d195a69fdaf209e6ae10bad6d7a7a9026d0ee392e8aa7a65f682998
Pulling from OCI Registry (ttl.sh/fd-wasm:48h)  elapsed: 1.9 s  total:  2.9 Ki  (1.5 KiB/s)

real    0m2.030s
user    0m0.011s
sys     0m0.019s
ubuntu@vf2:~$ sudo ctr image prune --all
INFO[0000] deleted image: ttl.sh/fd-wasm:48h
ubuntu@vf2:~$ time sudo ctr  image pull --all-platforms ttl.sh/fd-wasm:48h
ttl.sh/fd wasm:48h                              saved
└──index (7371a6e30d19)                         complete        |++++++++++++++++++++++++++++++++++++++|
   ├──manifest (6336db3c947d)                   complete        |++++++++++++++++++++++++++++++++++++++|
   │  ├──config (bc9c2208b4fa)                  complete        |++++++++++++++++++++++++++++++++++++++|
   │  └──unknown (5be30eca2335)                 waiting         |--------------------------------------|
   └──manifest (768d2baec79d)                   complete        |++++++++++++++++++++++++++++++++++++++|
      ├──layer (82cc3c1443ff)                   complete        |++++++++++++++++++++++++++++++++++++++|
      ├──config (912b4a2c9134)                  waiting         |--------------------------------------|
      └──layer (3c4d64c1ee5a)                   complete        |++++++++++++++++++++++++++++++++++++++|
application/vnd.oci.image.index.v1+json sha256:7371a6e30d195a69fdaf209e6ae10bad6d7a7a9026d0ee392e8aa7a65f682998
Pulling from OCI Registry (ttl.sh/fd-wasm:48h)  elapsed: 2.6 s  total:  818.2   (312.3 KiB/s)

real    0m2.711s
user    0m0.008s
sys     0m0.024s
matsbror commented 1 week ago

@jprendes Yes, it looks like that but it is unreasonable since the shim executable is in the same directory as containerd, ctr and the runcshim, which all work. How would I otherwise find out the PATH that a running service uses.

cpuguy83 commented 1 week ago

@matsbror I said GC, probably it is not even involved here. I'm sure ctr is setting the default platform for the machine and not even fetching anything other than the image index because the system platform is not in the index.

matsbror commented 1 week ago

@cpuguy83 that might make sense. Now maybe you could help me understand a tangential issue: I have noticed that pulling wasm images is a lot slower (in terms of data rate) than pulling native images. Would you have any hunch on why?

cpuguy83 commented 6 days ago

@cpuguy83 that might make sense. Now maybe you could help me understand a tangential issue: I have noticed that pulling wasm images is a lot slower (in terms of data rate) than pulling native images. Would you have any hunch on why?

I don't think there's anything related to containerd as to why it is slow.