containerd / runwasi

Facilitates running Wasm / WASI workloads managed by containerd
Apache License 2.0
1.01k stars 82 forks source link

Where do we define the Wasm "container" boundry #619

Open jsturtevant opened 2 weeks ago

jsturtevant commented 2 weeks ago

We are doing a bunch of set up work for the wasm runtime inside the container and using rust log crate. We are effectively still operating as the "shim" until we invoke the Wasm binary for example https://github.com/containerd/runwasi/blob/7cc5e5a8025f1508c264dee1faf1c8c57cf645a1/crates/containerd-shim-wasmtime/src/instance.rs#L163

This leads to some confusion from an end user perspective in some scenarios. For instance if a user is using a Rust wasm application and they set RUST_LOG on their podspec, then it will switch logging for everything in https://github.com/containerd/runwasi/blob/7cc5e5a8025f1508c264dee1faf1c8c57cf645a1/crates/containerd-shim-wasm/src/container/engine.rs#L16.

This could also happen when we start to enable tracing in https://github.com/containerd/runwasi/pull/582

Since there is some additional set up of the wasm runtime, this leads an issue of where the boundary should live for the container vs wasm.

jsturtevant commented 2 weeks ago

some further conversation and context in https://cloud-native.slack.com/archives/C06PC7JA1EE/p1717097248450109

Mossaka commented 2 weeks ago

Let me clarify why we are seeing "if a user is using a Rust wasm application and they set RUST_LOG on their podspec, then it will switch logging for everything in run_wasi"

There are two shim processes when started:

When the child shim process invokes youki's ContainerBuilder::build() it reads the env vars from the Container Spec, hence from the PodSepc, and changes the process's env vars (e.g. this line).

This happens prior to run_wasi. Hence everything after run_wasi is using the env vars from PodSpec.


Back to your question: "where the boundary should live for the shim vs wasm?". I think from the user's perspective, they would have assumed that their application environment variables (i.e. from PodSpec) should only be seen by their application (i.e. WASI environment consumed by Wasm modules).

The next question is how do we achieve that?

I am thinking about either

  1. moving Wasm runtime (Spin, Wasmtime, WasmEdge, etc.) out of youki's LibcontainerExecutor so that the runtimes won't be able to see env vars from PodSpec.
  2. telling youki to not mutate main process's env vars (i.e. disable the code from here) and passing env vars to the Wasm runtime's WASI context. This routes PodSpec env vars directly into WASI environemnt that only wasm modules can see
kate-goldenring commented 2 weeks ago

I am concerned about these options. If we do not allow for configuring the Wasm runtime/Spin engine via container env vars, how will we do app level execution configuration? For example, how do we configure which port for a Spin app to listen on?

I'd prefer to continue to set these env vars on the Wasm runtime and tell the users a user story wherein for Wasm, the container env vars configure the Wasm runtime, especially given that Spin apps often contain multiple Wasm instances. Would setting container env vars mean setting those env vars in each instance's environment? This is part of the reason Spin encourages using configuration variables over environment variables.

I do understand that it is a different user story than that of containers. If there is a way we can configure the Spin runtime and Spin app env vars at a container level that would be a double win.

jsturtevant commented 1 week ago

Would setting container env vars mean setting those env vars in each instance's environment?

right now, spin doesn't use the env's in the same way as wasmtime shim, so each instance only gets the the values in spin-toml as defined in https://developer.fermyon.com/spin/v2/variables


I think one of the challenges is there is a additional component in scope compared to the traditional container environment. One of the goals of runwasi is wrap the wasm execution in a "container" to provide additional security and smoother transition coming from the expectations of using containers.

The additional component comes from the fact that wasm has two parts: a host (spin or wasmtime) and guest (wasm binary). They both have unique configuration values beyond the container. With our current set up we run both in the "container" which means from an OCI perspective we don't really have the ability to distinguish between the two.

Should the host run inside the context of the container? I believe it should. A concrete example is given by @kate-goldenring : the host needs to know the ip and port to bind to.

How do we distinguish between host configuration and wasm configuration? After reading up on Spins approach to configuration. I am under the impression that we are not doing the right thing with the wasmtime shim: https://github.com/containerd/runwasi/blob/7cc5e5a8025f1508c264dee1faf1c8c57cf645a1/crates/containerd-shim-wasmtime/src/instance.rs#L76. Spin has a nice way of solving this via the application variables, I am not sure if wasmtime does.

I am still unsure the correct solution across all shims... Maybe it is unique choice for each shim?

kate-goldenring commented 1 week ago

right now, spin doesn't use the env's in the same way as wasmtime shim, so each instance only gets the the values in spin-toml as defined in https://developer.fermyon.com/spin/v2/variables

Spin does also support environment variables. Similar to wasmtime run --env Foo=bar, spin up --env Foo=bar. But, as you mention, the values are static rather than dynamic; however, the static nature of environment variables in containers is normal.

We could also support the --env container arg to get environment variables injected into Wasm instances. Or, we could determine a prefixing syntax for determining which envs are for the runtime and should not be exposed to the guest. Or, we could move away from env vars in the guest and use something like application variables. Alternatively, we could support adding a .env file to the OCI artifacts and import those. I am throwing out ideas but i am also reluctant to drop support for envs for components executed by the wasmtime shim. So maybe I am coming to the same summary that "Maybe it is a unique choice for each shim?"

jsturtevant commented 1 week ago

We could also support the --env container arg to get environment variables injected into Wasm instances. Or, we could determine a prefixing syntax for determining which envs are for the runtime and should not be exposed to the guest. Or, we could move away from env vars in the guest and use something like application variables. Alternatively, we could support adding a .env file to the OCI artifacts and import those. I am throwing out ideas but i am also reluctant to drop support for envs for components executed by the wasmtime shim. So maybe I am coming to the same summary that "Maybe it is a unique choice for each shim?"

The ideas are great, I agree it does come down to the shim (but maybe there is opportunity for Wasm and/or OCI to provide a way to standardize this longer term) and I also don't want to drop support the envs fully from wasmtime shim.

Having it be a unique choice for each shim, seems like a good approach for now.

kate-goldenring commented 1 week ago

@jsturtevant @Mossaka @devigned @cpuguy83 in our runwasi meeting this morning, we discussed this more synchronously, settling on a potential approach of setting all container env vars as application variables in the Spin shim. After some thinking, i found a caveat to this approach: this only works if you use the environment variable provider for application variables, which can also be set through other providers such as Azure Key Vault or Vault. This is because Spin currently only supports one application provider per app.

In our discussion on the sandbox API, @devigned pointed out that in general we should take the approach that is the most container-like to users. On that note, how do we handle injection of cluster information environment variables such as KUBERNETES_SERVICE_HOST, KUBERNETES_SERVICE_PORT, OTHER_SERVICE_HOST, etc? To ensure that components can access these, they would need to be injected as environment variables rather than application variables. This is because if a user were to choose to use the Vault application variables provider, there would be no way to set these values as they are passed as container envs. We could have an allow list of envs we will inject, or we could inject everything into every component (unless it has a SPIN prefix to indicate it should only be used for the runtime). Or we could find a way to support multiple application variable providers so some variables (such as the cluster information ones) could be grabbed from the environment variable provider and others from another provider (say Vault).

~Short term, if it is a defined list of environment variable suffixes and prefixes we can expect from Kubernetes, I think we may be best suited to set them as environment variables in all components~

[[EDIT]]: Spin already supports multiple variables providers. [[variables_provider]] is a list; they get queried in order until one returns a value, so our approach of setting all container envs as application variables and leaving it up to components to select which ones to allow holds!

devigned commented 1 week ago

\o/ for multiple variable providers!