WebAssembly / component-model

Repository for design and specification of the Component Model
Other
897 stars 75 forks source link

Question: Why is it not possible to refer to exported types in imports? #272

Open dicej opened 7 months ago

dicej commented 7 months ago

The Component Model does not currently allow you to refer to exported types in imports, which feels asymmetric given that one can easily refer to imported types in exports. This becomes a practical concern when attempting to virtualize interfaces such as wasi:http/incoming-handler, which contains a handle function with parameters which are handles to resources which the guest is importing. The most natural way to virtualize that interface is unfortunately impossible since it would require importing a function with parameters which are handles to resources which the virtualizing component is exporting.

Is there a fundamental reason why this is not possible, e.g. by allowing a component to declare a type, refer to it in an import, and later export it?

dicej commented 7 months ago

To be clear: I realize this can be worked around, so this feature is not strictly necessary, but it does feel like a big paper cut, so I'd like to understand the rationale.

lukewagner commented 7 months ago

Great question! This restriction in the design is I think rather fundamental, unfortunately, but it does provide us with concrete benefits that I'll explain.

At the heart of it, you can say that this restriction stems from two things:

First, the acyclicy of component instantiation: when instantiating a component, there is a well-defined acyclic order of executing its contained top-level definitions (in the order they show up in the binary) that only allows a definition to depend on preceding definitions.

Second, although resource types are "types" which makes them feel like they are global and should exist before runtime, once you take into account the full semantics of parametric linking (where, e.g., a single component can be instantiated multiple times and where the same imported/exported resource type name-string can refer to different resource implementations in different contexts), you have to think of resource types as being the type of a dynamic tag (or "brand") that identifies a particular resource implementation (defined to be a set of functions closed over the core wasm state of a particular component instance) and this tag is created when "executing" the definition of a resource type. Thus, before I instantiate a component C that defines and exports a resource type R, the tag for R doesn't exist yet (it hasn't been created by C yet) and thus it's not possible to define functions before C is instantiated that uses R in the function's params/results, and so it doesn't really make sense to include R in C's imports' types.

Technically, I think it is possible to define a module system with abstract types that can be cyclicly used between modules but:

Instead, I'm hoping we can spend more time talking through how virtualization can be implemented and also what sorts of new tooling (e.g., radically expanded wasi-virt) we can build to make this easier for devs in practice.

guybedford commented 7 months ago

These points all make a lot of sense. As far as I can tell, the only way to tackle the use case would be to treat WASI-Virt as a sandwich virtualization - one component for import virtualizations, and one component for export virtualizations that also imports from the import virtualizations component, where the component being virtualized is instantiated against the import virtualizations, then the export virtualization component is instantiated with that.

The problem with this sandwich approach though is that we end up with an extra memory when ideally there should be memory sharing here.

Are there techniques to avoid this in the model via a component which doesn't have any memory? Ie reexporting the realloc from the base-level import virtualization component?

When thinking about how to design this it seemed like supporting instantiated core module imports might be necessary for the memory sharing, which doesn't seem to be supported in the model currently?

Would definitely be good to discuss the design space here further.

lukewagner commented 7 months ago

You're right that one way to resolve this is to sandwhich the component being virtualized between two other components, and I agree that this adds some extra overhead we'd like to avoid.

Fortunately, the C-M does allow a more-advanced alternative approach that takes advantage of the fact that a parent component can both supply its own lifted core functions to be the imports of a child component and alias the exports of that child component, lowering them into core functions that can, crucially, share the same linear memory as the aforementioned parent lifted functions.

As a sketch:

(component $Parent
  (core module $PCore ...)
  (core instance $pcore (instantiate $PCore))
  (func $parent_import (canon lift (core func $pcore "parent_import") ...))
  (import "child" (component $Child ...))
  (instance $child (instantiate $Child (with "parent_import" (func $parent_import))))
  (core func $child_export (canon lower (func $child "child_export") ...))
  ... use a tiny helper core module to set $child_export into the core func table of $pcore
)

Thus, the core wasm code in $pcore can both be called by and call into the child component (being virtualized).

(Note that this does allow the parent to create resource handle cycles with the child: the parent can own a handle for a resource exported by the child, but also define a resource type imported by the child so the child resource can own a handle pointing back into the parent. But in this case, we can say that it's the parent's "fault", and not fundamentally different than if a single core module internally contained a reference-counting cycle -- the C-M can't avoid all leaks, just the "no-fault" ones that are highly problematic.)

The big question is how to expose this raw WAT-level functionality to source-language toolchains. In a Rust context, I expect we could figure out some proc-macro magic that can be attached to a static variable that represents the $child instance (where the proc-macro lets you list the with and alias exports and codegens accordingly). In a JS context, I think that the (import "child" (component $Child ...)) could be generated by a source-phase import and then we need to figure out an instantiation syntax that can be statically mapped (or perhaps wizer'd) to C-M instantiate. There's lots more to figure out here, but I think it's fundamentally an (advanced) source-language bindings question.