WebAssembly / component-model

Repository for design and specification of the Component Model
Other
933 stars 79 forks source link

Defining client callback type in WIT #223

Open bobogei81123 opened 1 year ago

bobogei81123 commented 1 year ago

I'm trying to define a component interface in WIT format so that components (client) can pass callbacks (closures) to host. Host can then store the callback somewhere and then invoke the callback when certain event happens.

I'm a bit confuse how can I describe this using WIT format. I'm thinking of something like this:

package foo:bar

world client {
  resource callback {
    // Invokes the callback
    call: func()
  }

  // Passes the callback from client to host
  import register-callback: func(callback: callback)

  // The main function for the client
  export main: func()
}

But because I didn't export the resource (I can't do export callback apparently), I guess this is treated as if callback is implemented by the host, which is not I want. I can change it to:

package foo:bar

interface host {
  use callback-types.{callback}

  register-callback: func(callback: callback)
}

interface callback-types {
  resource callback {
    call: func()
  }
}

interface client {
  use callback-types.{callback}
  run: func()
}

world client-world {
  import host
  export client
}

But then, will host/callback and client/callback be treated as the same type?

lukewagner commented 1 year ago

Hi, good question, thanks for asking. As a general design choice, the component model doesn't support passing first-class functions/callbacks to imports since this usually leads to cyclic leaks (due to entrained scope chain) which aren't possible to collect without cross-component GC, which we don't have (also as a design choice). This design choice was based on significant experience with this issue in browsers (which all ended up being forced into some form of complex cross-language garbage-/cycle-collection).

Instead, to address the main concurrency use cases for callbacks, the plan is to express concurrency in terms of Wit-level futures and streams, which we currently emulate using resource types (e.g., see wasi-io). The nice things about futures and streams is that they maintain acyclic ownership and can also be mapped directly to many languages' native concurrency support by wit-bindgen (instead of requiring this to be done by hand). Thus, for your use case, I'd try to see if it can be re-expressed in terms of futures/streams, emulated by resource types. (Another example of this is wasi-http (which hasn't been updated yet to use the bleeding-edge resource type support yet, so it's emulating future/stream in terms of emulated resource types, but that will soon change :P ).

There are also other use cases for callbacks, so happy to discuss them more here; there are other approaches for these alternative use cases.

bobogei81123 commented 1 year ago

This design choice was based on significant experience with this issue in browsers (which all ended up being forced into some form of complex cross-language garbage-/cycle-collection).

I might be wrong, but wouldn't reference counting work for most of the case? For example, the client can define the callback as a shared (reference counting) pointer to a closure. Then the client convert the shared pointer into a resource which has a call method. The host store the resource handle somewhere when the client pass it, and invoke the call method to execute the callback. Once the callback is no loger used by the host, it can call the destructor of the resource.

What I think is missing here is a way for the client to import a function from host, like register-callback: func(f: callback) but the resource type callback is implemented and export by the client itself. In other words, a way for the host to define a resource type that clients should implement.

lukewagner commented 1 year ago

The challenge arises when the reference-counted callback holds alive the callback's closure (aka environment or scope chain), which holds (via local or global variable) a handle to the host resource that owns the callback. (E.g.: in the Web, an example is when a JS callback is stored on a DOM node where the callback function's scope chain holds a reference to the DOM node.) When this happens, you get a reference count cycle which leaks unless you additionally have a way to detect and free cycles. Browsers tried to work around this problem for years with various partial fixes for special cases, but these kept breaking down and leaking in subtle ways (this is type of problem that only shows up at scale too), ultimately making their way to general cross-language cycle collection of some sort, which we don't want.

For what it's worth, there is a sort of brute force way to achieve something like what you're talking about:

world guest {
  import register-callback: func(callback-name: string)
  import unregister-callback: func(callback-name: string)
  export call-callback: func(callback-name: string, args: list<string>) -> string
}

The idea being that the guest registers the name of the callback and is responsible for keeping this callback alive as long as call-callback may be called with that name. Heterogeneous callback signatures could be supported by having multiple variations of call-callback with different signatures or, at some point in the future, with Wit templates. But again, if possible, it would be preferable to express the control flow patterns in terms of higher-level acyclic concurrent resources like futures/streams.

bobogei81123 commented 1 year ago

The challenge arises when the reference-counted callback holds alive the callback's closure (aka environment or scope chain), which holds (via local or global variable) a handle to the host resource that owns the callback. (E.g.: in the Web, an example is when a JS callback is stored on a DOM node where the callback function's scope chain holds a reference to the DOM node.) When this happens, you get a reference count cycle which leaks unless you additionally have a way to detect and free cycles.

Thank you for the detailed explanation. But doesn't that the reference cycle problems will come up when host and clients are allowed to exchange types with resources (I guess one can create a reference cycle if resources type pointing to each other).

Also, just curious, is making JS bindings able to be specified by the component model in WIT format one of the future goals? If yes, how are we going to support this? Perhaps by allowing GC types to be specified in the component model / WIT?

For what it's worth, there is a sort of brute force way to achieve something like what you're talking about:

world guest {
  import register-callback: func(callback-name: string)
  import unregister-callback: func(callback-name: string)
  export call-callback: func(callback-name: string, args: list<string>) -> string
}

The idea being that the guest registers the name of the callback and is responsible for keeping this callback alive as long as call-callback may be called with that name. Heterogeneous callback signatures could be supported by having multiple variations of call-callback with different signatures or, at some point in the future, with Wit templates. But again, if possible, it would be preferable to express the control flow patterns in terms of higher-level acyclic concurrent resources like futures/streams.

I think this is the way I'm seeking. I'm wondering if we can replace string with some resource types so the functions (register-callback, unregister-callback) can also be specified by wit.

There are particular reasons I don't want to go for the async route. I'm trying to add a WASM runtime to Neovim that exposes all the public APIs. With component model I only need to create the WIT files of the API functions, auto-generate a little glue code in the host side, and I don't even need to provide any glue code in the client side. But if callbacks can't be express in the component model easily, like I need to translate to async style or using the "brute force way" you mentioned, then I'll need to either wrap the host API function manually, or provide glue code or detailed binding specification in the client side.

lukewagner commented 1 year ago

Hi, great questions again, thanks.

But doesn't that the reference cycle problems will come up when host and clients are allowed to exchange types with resources (I guess one can create a reference cycle if resources type pointing to each other).

In the normal client/server relationship between two components A and B, where A imports an instance of B (through a Wit interface), there can't be a cycle because B can't import A's resource types (cyclic imports aren't possible) and thus B can't hold handles to A-implemented resources. When there is a parent/child relationship between components A and B (i.e., A instantiates B), it is possible for A to supply an A-implemented resource type to B then alias a B-implemented resource type that B exports, thereby allowing A to create a cycle through B. But in this case, it's A's "fault" and so not fundamentally different than A having an internal cycle within A's linear memory. Thus, we're technically only avoiding cyclic footguns, not making cycles impossible, and doing so mostly as a side effect of the generativity of resource types and the acyclicy of component instantiation (each of which is independently strongly motivated).

Also, just curious, is making JS bindings able to be specified by the component model in WIT format one of the future goals? If yes, how are we going to support this?

Because Wit and the Component Model are supposed to represent a sort of fuzzy cross-language intersection of types that can be mapped "pretty well" into "most" languages, it isn't a goal to be able to express every possible JS interface in full fidelity in Wit. That being said, for concurrency, JS APIs have generally been moving away from callbacks toward promises and streams (making code much nicer when using async/await), which Wit and the Component Model will have great binding for (and it should often be possible to adapt pre-existing callback-based APIs into Promise-returning APIs using JS shims).

That being said, there is another less-well-developed idea to add a form of "scoped" callback (scoped to either the call (similar to how borrow handles are currently scoped), or to a parent resource (that the callback is owned by and can't outlive) that would be structurally-typed (like list, in that it's not a generative resource type that has to be imported) which would attempt to avoid the general cyclic leaks by making the lifetimes explicit. It sounds like perhaps this is closer to what you're asking for. Unfortunately, this won't be in the Preview2 short term, so you'd need a workaround such as we were discussing, but it could potentially be in the next preview release next year, depending on prioritization.

Extrapolating from the use case you described (which sounds really cool, btw!), maybe what I'd do in the short term is have a world that exports each possible type of callback using a u64 as a classic "closure" parameter (e.g., export on-key-down: func(closure: u64, ...) -> ...) and then the imports that register the callback take a u64 (e.g., import listen-for-key-down: func(closure: u64, ...) -> ...). With this, I could imagine maybe you could get do a bit of boilerplate codegen that avoided too much manual effort?

juntyr commented 1 year ago

In the normal client/server relationship between two components A and B, where A imports an instance of B (through a Wit interface), there can't be a cycle because B can't import A's resource types (cyclic imports aren't possible) and thus B can't hold handles to A-implemented resources. When there is a parent/child relationship between components A and B (i.e., A instantiates B), it is possible for A to supply an A-implemented resource type to B then alias a B-implemented resource type that B exports, thereby allowing A to create a cycle through B. But in this case, it's A's "fault" and so not fundamentally different than A having an internal cycle within A's linear memory. Thus, we're technically only avoiding cyclic footguns, not making cycles impossible, and doing so mostly as a side effect of the generativity of resource types and the acyclicy of component instantiation (each of which is independently strongly motivated).

Thank you so much for this detailed discussion! I'm trying to write an API where a resource cycle between a parent and a child component would be really handy. Unfortunately, I don't think that WIT future<T>s or stream<U, T>s would be enough for my usecase, which would require something akin to a channel<M, R> type which accepts a message of type M and produces a response of type R in return:

trait Channel<M, R> {
    fn send(&mut self, msg: M) -> R;
}

Since this type does not exist, I'm trying to understand if the resource cycle workaround you've sketched out would be possible and useful in my case.

it is possible for A to supply an A-implemented resource type

Does this refer to a resource type from an interface that A exports?

supply an A-implemented resource type to B

How can the parent component's export be supplied to a child component's import? So far my understanding is that an export can only be supplied to a parent, grandparent, ... while and import can only be satisfied by a child, grandchild, ... .

then alias a B-implemented resource type that B exports

Does this mean exporting the same resource under the same name as B?

Thanks for your help!

lukewagner commented 1 year ago

Unfortunately, I don't think that WIT futures or stream<U, T>s would be enough for my usecase, which would require something akin to a channel<M, R> type which accepts a message of type M and produces a response of type R in return

Yes, that makes sense; in a single-threaded context, callbacks and channels are pretty similar things. So it does seem like some form of scoped callback is what you'd ideally want.

Does this refer to a resource type from an interface that A exports? How can the parent component's export be supplied to a child component's import?

Good question! In addition to being able to expose resource types to the outside world through exports, parent components can pass any local definition directly to a child they are instantiating via with, e.g.:

(component $Parent
  (type $A (resource (rep i32)))
  (component $Child
    (import "A" (type (sub resource)))
    ...
  )
  (instance $child (instantiate $Child) (with "A" (type $A)))
)

So this allows a single parent component to both supply imports to and project exports from its child, allowing certain kinds of resource cycles through the parent. Unfortunately, even in this context, there's not a way for a child component to define and export a resource type that the child also uses in the types of its imports (which is I think what you're getting after); this is due to the acyclic validation rules of instantiation and types. (Using a structural type for callbacks would avoid this circularity.)

juntyr commented 1 year ago
(component $Parent
  (type $A (resource (rep i32)))
  (component $Child
    (import "A" (type (sub resource)))
    ...
  )
  (instance $child (instantiate $Child) (with "A" (type $A)))
)

That's a really cool feature - thank you so much for bringing it to my attention! I think this might be enough to get my use case to work. Is there a way to instantiate this very local parent-child cycle using e.g. wasm-compose?

Unfortunately, even in this context, there's not a way for a child component to define and export a resource type that the child also uses in the types of its imports (which is I think what you're getting after)

I already came across this issue when I prototyped my WIT definition and have found a ... workaround for now. I essentially break the resource definition cycle by defining explicit handle records for resource imports that would create a cycle, and manually do the encoding and decoding between real resources and these pseudo-handles wherever it's needed.

While this works for my small prototype, I'm definitely very interested in a more canonical approach using structural callbacks or channels (I think a single-thread channel would be what stream is to future in that you can use it multiple times?).

Do you think that a structural callback (or channel) type callback<A, R>, which can be invoked using cb(args: A) -> R, could be added to the component model? Could this type be passed as an input to imported functions and produced as (part of) a return value in exported functions?

lukewagner commented 1 year ago

Is there a way to instantiate this very local parent-child cycle using e.g. wasm-compose?

Not at the moment; wasm-compose currently focuses on doing black-box composition of pre-existing components. I think to expose this sort of parent-wraps-child composition in an easily-usable manner , we'll need to add support in source language toolchains for emitting instantiate (supplying with arguments in terms of source-language constructs). E.g., in Rust, this might take the form of a special macro that you attach to a global variable declaration that represents the instance being created and lets you supply preceding global names as with arguments. We haven't started building that yet, but we could once the current wave of bindings work settles down.

Do you think that a structural callback (or channel) type callback<A, R>, which can be invoked using cb(args: A) -> R, could be added to the component model? Could this type be passed as an input to imported functions and produced as (part of) a return value in exported functions?

Hypothetically yes (not right now, but after this Preview2 milestone we're currently focused on), it feels like a callback type might make sense as part of the overall concurrency story (filling in gaps left by future and stream), so it could fit in with the Preview3 focus on "async support". To be clear, though, I don't think a simple function type (incl. callback<A, R>) would work, since it would have the emergent leak problems mentioned above. For the use case of "attach a callback to a widget/node (represented by a resource)", that's tricky, but I've got a feeling that the "parent-scoped handles" sketched in the child-handles branch (also post-Preview2) are the key to fixing this problem: if we can write:

add-event-listener: func(w: widget, cb: child callback<event,action> of w)

and give it strong semantic guarantees, then a GC language's caller-side bindings for add-event-listener may have enough semantic information to not root cb (as they would be forced to do with a plain function type), but, rather, express cb as a GC out-edge of w, which perhaps fixes the problem (but I'm not sure; more consideration is necessary).

FrankReh commented 1 month ago

@lukewagner This was such an interesting discussion you had with the goal of getting Component Wasm into neovim by the original OP and another of making a parent Component with a child Component more tractable. Are there any updates now that Preview 2 is out and work is under way for a better async story?

Small/trivial use case for context: I just found this thread because I wondered about creating a wasm module to handle a new filetype that would let a raw JSON file be displayed and edited as something richer, maybe mind-map link with nodes and edges (but within the capabilities of the neovim UI).

Being able to write the wasm component once and then see it work in neovim, and perhaps a browser with appropriate additional JS glue, would be very nice.

lukewagner commented 1 month ago

Glad to hear it! Indeed, work is well underway to flesh out async for Preview 3. I also continue to really like this idea of adding scoped callbacks of the form sketched above (allowing traditional-style callbacks while avoiding the cyclic leaks). Unfortunately, it seems like, given the already-large scope of Preview 3, that scoped callbacks would need to be part of the next batch of functionality (perhaps "1.0-rc) which I believe would include scoped resource handles and runtime instantiation (all of these features being subtly inter-related and thus a coherent unit of design+implementation).