containerd / runwasi

Facilitates running Wasm / WASI workloads managed by containerd
Apache License 2.0
1.06k stars 88 forks source link

Store Compiled Modules when using OCI Wasm Images #405

Closed jsturtevant closed 7 months ago

jsturtevant commented 10 months ago

this would replace #44

jsturtevant commented 10 months ago

fyi @cpuguy83

jsturtevant commented 10 months ago

I think we need to add some GC labels so containerd does not collect this content. See: https://github.com/containerd/containerd/blob/8459273f806e068e1a6bacfaf1355bbbad738d5e/docs/garbage-collection.md#garbage-collection-labels

From my experimenting, the GC is looks to be a top down model. So I've put the label that retains on the image level: https://github.com/containerd/runwasi/pull/405/files#diff-74fa6f2b9f387ae0683336f3c04b7e15658dbb2c2820f74bd143894009593d1bR365-R368. If the image gets removed this will be removed as well.

log::debug!("updating content with precompile digest to avoid garbage collection");
let mut image_content = self.get_info(digest.clone())?;
image_content.labels.insert("containerd.io/gc.ref.content.precompile".to_string(),precompile_digest.clone(),);

How do we invalidate (and de-reference in containerd) pre-compiled modules? As I understand it a pre-compiled module will not work across different versions of a runtime.

Also we likely need to annotate what runtime the content is for.

I did this here, but as you point out might need to do this per version as well. That way if a runtime is upgraded it would re-compute.

https://github.com/containerd/runwasi/pull/405/files#diff-74fa6f2b9f387ae0683336f3c04b7e15658dbb2c2820f74bd143894009593d1bR331

let label = format!("runwasi.io/precompiled/{}", T::name());

For dereferencing, do you think it is enough to wait for the image to be removed, which would drop the couple versions (if there are any)? I don't think it will be too much space to have a couple versions of it sitting around vs complexity to dereference them on an ongoing basis. For example, during the upgrade of the shim I think there might be alot of edge cases.

Also, when writing the content we'll need to attach a lease to it until gc labels are applied otherwise containerd will collect it.

Ah, I was wondering about that! There did seem to be a small moment where it could be cleaned up before I put the label on the image. I will take a look at putting a lease on it.

Mossaka commented 10 months ago

@jsturtevant Started reviewing this PR. How do I run this to see if the modules are stored?

jsturtevant commented 10 months ago

The shim logs have will have statements and you will also be able to verify in containerd using ctr:

Find the compiled info:

sudo ctr i ls | grep "runwasi.io"
ghcr.io/containerd/runwasi/wasi-demo-oci:latest                                                             application/vnd.oci.image.manifest.v1+json
               sha256:60fccd77070dfeb682a1ebc742e9d677fc452b30a6b99188b081c968992394ce 2.4 MiB   wasi/wasm                                                                                                                           
runwasi.io/precompiled=sha256:b36753ab5a46f26f6bedb81b8a7b489cede8fc7386f1398706782e225fd0a98e

See it in the content store:

sudo ctr content ls | grep "b36753ab5a46f26f6bedb81b8a7b489cede8fc7386f139870"
sha256:60fccd77070dfeb682a1ebc742e9d677fc452b30a6b99188b081c968992394ce 561B    2 months        containerd.io/gc.ref.content.0=sha256:a3c18cd551d54d3cfbf67acc9e8f7ef5761e76827fe7c1ae163fca0193be88b3,containerd.io/gc.ref.content.config=sha256:85b7f2b562fe8665ec9d9e6d47ab0b24e2315627f5f558d298475c4038d71e8b,containerd.io/gc.ref.content.precompile=sha256:b36753ab5a46f26f6bedb81b8a7b489cede8fc7386f1398706782e225fd0a98e
sha256:b36753ab5a46f26f6bedb81b8a7b489cede8fc7386f1398706782e225fd0a98e 626.4kB 3 days          runwasi.io/precompiled=sha256:60fccd77070dfeb682a1ebc742e9d677fc452b30a6b99188b081c968992394ce

save it to local file:

ctr content get sha256:b36753ab5a46f26f6bedb81b8a7b489cede8fc7386f1398706782e225fd0a98e > precompiled.wasm
jsturtevant commented 9 months ago

Also, when writing the content we'll need to attach a lease to it until gc labels are applied otherwise containerd will collect it.

Ah, I was wondering about that! There did seem to be a small moment where it could be cleaned up before I put the label on the image. I will take a look at putting a lease on it.

Added a lease during new content creation

jsturtevant commented 9 months ago

Another question I had was if this is something we want to turn on for all images (as the code is now) or if we want to put an annotation on the image that would enable this feature.

Mossaka commented 9 months ago

@CaptainVincent does WasmEdge Rust SDK have an API to AOT compile a Wasm module to bytes that's similar to https://github.com/bytecodealliance/wasmtime/blob/7690c500228ae005fc70ee3b6297bb4b403686d5/crates/wasmtime/src/engine.rs#L222-L223?

jsturtevant commented 9 months ago

Another question I had was if this is something we want to turn on for all images (as the code is now) or if we want to put an annotation on the image that would enable this feature.

I've thought on this a bit more and I think it should be opt in feature so will add support for an annotation

cpuguy83 commented 9 months ago

I don't think an annotation on the image would work well. A runtime annotation could work but... I'm curious why do you think it should be opt in? I can see the desire to opt in for now because its kind of experimental, but then the whole stack is.

We must compile the wasm to run it at all, so we already need to store it. Keeping it cached so the same wasm isn't compiled more than once is just an optimization.

CaptainVincent commented 8 months ago

Yes. This lines of the code provides two methods about bytes to file and file to file. If needed, I can also add bytes to bytes inside wasmedge-rust-sdk or integration layer.

@CaptainVincent does WasmEdge Rust SDK have an API to AOT compile a Wasm module to bytes that's similar to https://github.com/bytecodealliance/wasmtime/blob/7690c500228ae005fc70ee3b6297bb4b403686d5/crates/wasmtime/src/engine.rs#L222-L223?

jsturtevant commented 8 months ago

I can see the desire to opt in for now because its kind of experimental, but then the whole stack is.

We must compile the wasm to run it at all, so we already need to store it. Keeping it cached so the same wasm isn't compiled more than once is just an optimization.

I was leaning towards it being experimental but I can keep it as.

jsturtevant commented 8 months ago

Honestly it took me some time to grasp the interaction between containerd and the sandbox for saving / retriving precompiled content using leases. I am still not 100% sure the full picture of the interactions. Would be appreciate if you clarify the logic here (maybe document it somewhere?)

good idea! I've added some documentation, you can see the rendered page by viewing the markdown: i.e: https://github.com/containerd/runwasi/blob/0135abace198e39a1f4f6aae6ae74605619e42b2/docs/oci-descision-flow.md

cpuguy83 commented 8 months ago

Tested this out.

  1. This doesn't work with Docker. Not your fault since Docker is not filling in container.Image which is how you are traversing the content. I have a patch prepared (for docker) to remedy this, but it would only work when docker is using containerd-backed storage.
  2. container.Image is not the best data to use... we should add a new field to the containerd protos (in containerd/containerd) for an immutable reference to the image manifest being used... for now we'd need to work with what we've got
  3. Removing the pre-compiled content seems to make the container refuse to ever start until we wipe out the image and start over.

On the Docker usage, I've noticed some peculiar things with runwasi. The first container start is quick (considering it is compiling the content). Then subsequent starts of the same container are actually slower. Removing the container in docker and recreating gives a faster startup (even faster since it has the pre-compiled content) and then the 2nd start again slows down (by an order of magnitude here).

jsturtevant commented 8 months ago
  1. This doesn't work with Docker. Not your fault since Docker is not filling in container.Image which is how you are traversing the content. I have a patch prepared (for docker) to remedy this, but it would only work when docker is using containerd-backed storage.

Thanks!

container.Image is not the best data to use... we should add a new field to the containerd protos (in containerd/containerd) for an immutable reference to the image manifest being used... for now we'd need to work with what we've got

Created an issue (#460) to look into this and possibly use something from the conatiners snapshot as you suggested offline. I think we can improve the overall design in way Containerd passes WASM info but waiting on the discussions in the CNCF wg-wasm in regards to an offiical OCI Artifact for wasi.

Removing the pre-compiled content seems to make the container refuse to ever start until we wipe out the image and start over.

This is a bug! I will fix this and look at adding tests to cover this scenario more.

jsturtevant commented 8 months ago

On the Docker usage, I've noticed some peculiar things with runwasi. The first container start is quick (considering it is compiling the content). Then subsequent starts of the same container are actually slower. Removing the container in docker and recreating gives a faster startup (even faster since it has the pre-compiled content) and then the 2nd start again slows down (by an order of magnitude here).

This is interesting since I am seeing an improvement in speed with the changes when using ctr (https://github.com/containerd/runwasi/pull/405#discussion_r1465444917). Does the slow behavior for the second start replicate with the non-oci image?

lann commented 8 months ago

You might want to include Engine::precompile_compatibility_hash in the cache key; I added this method for precisely this purpose :slightly_smiling_face:.

Precompiled binaries are sensitive not just to runtime version but also CPU and engine config features. It doesn't seem entirely implausible that - for example - the content store could be migrated to a new host with different CPU features and then you'd have a very gnarly performance regression to debug...

jsturtevant commented 8 months ago

quick update, When I've added the tests it seems to have exposed a possible bug which I am debugging now. Essentially the OCI test run fine but when run with the other tests they are hanging and failing if run after an OCI test. We create a new engine each time so I am not really clear on what is causing the issue yet.

jsturtevant commented 8 months ago

quick update, When I've added the tests it seems to have exposed a possible bug which I am debugging now. Essentially the OCI test run fine but when run with the other tests they are hanging and failing if run after an OCI test. We create a new engine each time so I am not really clear on what is causing the issue yet.

I tracked this down to the call to parallelize the compile step here. The engine should be usable across threads and each test should be creating a new "engine" so I don't know why this happens.
https://github.com/bytecodealliance/wasmtime/blob/b6a8abc6ce31cf6db7ad5edb9b2567620190ce35/crates/wasmtime/src/engine.rs#L142

Mossaka commented 8 months ago

quick update, When I've added the tests it seems to have exposed a possible bug which I am debugging now. Essentially the OCI test run fine but when run with the other tests they are hanging and failing if run after an OCI test. We create a new engine each time so I am not really clear on what is causing the issue yet.

I tracked this down to the call to parallelize the compile step here. The engine should be usable across threads and each test should be creating a new "engine" so I don't know why this happens. https://github.com/bytecodealliance/wasmtime/blob/b6a8abc6ce31cf6db7ad5edb9b2567620190ce35/crates/wasmtime/src/engine.rs#L142

Did we turn on the "parallel-compilation" feature of wasmtime?

jsturtevant commented 8 months ago

Did we turn on the "parallel-compilation" feature of wasmtime?

yes we turn it that feature https://github.com/containerd/runwasi/blob/ea8c04f5d8073119d6b38f360311a8ec52dc9911/crates/containerd-shim-wasmtime/Cargo.toml#L23 and it looks to be avaliable on my machine as well as on CI

My next step is to make a small/simpler reproducible example

jsturtevant commented 8 months ago

I was able to create a simple repo: https://github.com/jsturtevant/rayon-clone-issue/blob/master/src/main.rs. After adding the unit tests I am effectively running into https://github.com/containerd/runwasi/issues/357 which is also called out in https://github.com/rayon-rs/rayon/issues/708#issuecomment-555699090

jsturtevant commented 8 months ago

You might want to include Engine::precompile_compatibility_hash in the cache key; I added this method for precisely this purpose 🙂.

Precompiled binaries are sensitive not just to runtime version but also CPU and engine config features. It doesn't seem entirely implausible that - for example - the content store could be migrated to a new host with different CPU features and then you'd have a very gnarly performance regression to debug...

I've updated wasmtime to use this hash for the label

jsturtevant commented 8 months ago

I was able to create a simple repo: https://github.com/jsturtevant/rayon-clone-issue/blob/master/src/main.rs. After adding the unit tests I am effectively running into #357 which is also called out in rayon-rs/rayon#708 (comment)

I've added the ability to disable the parallel compilation for the tests. We won't run into this issue as is in the shim since each shim is it's own instance but we need to address it longer term with #357

jsturtevant commented 8 months ago

test failure was #423

jsturtevant commented 7 months ago

@cpuguy83 I've addressed the feedback, could you take another look? Thanks!

Mossaka commented 7 months ago

🥳 Awesome! I am going to do a new release