WebAssembly / component-model

Repository for design and specification of the Component Model
Other
933 stars 79 forks source link

switch to a single handle type (for now) #203

Closed lukewagner closed 1 year ago

lukewagner commented 1 year ago

The current explainer has two handle types (own and borrow). Unfortunately, to describe realistic interfaces (such as WASI HTTP), additional handle types are needed (for own to actually mean (transitive) exclusive ownership). I was working on this in the child-handles branch before realizing the need for more varieties and thus a potentially better, more-general, more-orthogonal design (currently sketched in the generalize-handles) branch. In the meantime, we need to make progress on Preview 2, so for now (and perhaps until after Preview 2), I think it'd be a good idea to have a single handle type that represents non-exclusive, reference-counted ownership, since this is able to express any interface honestly (albeit sometimes imprecisely and suboptimally). In the future, when we add back more-precise handle types (capturing ownership, exclusivity and scope in the call contract), this non-exclusive, reference-counted handle would continue to exist as one of the options, so Wit and components written using it would continue to be valid.

There is the question of what to call this reference-counted handle type (in Wat and Wit). shared or ref would make sense, but these names are already in use by core wasm and so it's probably a good idea to avoid creating confusion. The best I could think of (that isn't super long) is rc (standing for "reference counted"), so that's what in the PR, but happy to hear better options. E.g., in Wit, this would look like rc<file> (where file is a resource type). This reserves use of the resource-type's name (without qualifier) for the unique/linear handle case, which seems like the right default option to suggest to interface authors.

A new issue that needs to be addressed for rc (and non-exclusive handles in general) is how to handle identity (e.g., when two rc handles to the same resource are given to a component). Guest programs may need to ask whether two handles point to the same resource and/or use a resource as a key in a hash-table. This can be solved in an ad hoc way on a per-interface basis (e.g., adding explicit eq and hash-code/id methods to resources), but (1) this is opaque to bindings generation, (2) hash-code/id raise various virtualization hazards, (3) it requires the interface author to anticipate usage patterns. Instead, this PR proposes that the Canonical ABI lowering automatically de-duplicates multiple rc handles pointing to the same resource, ultimately giving them the same i32 handle index, thus allowing the i32 index to serve as the (component-local) identity of the resource for use in bindings generators.

alexcrichton commented 1 year ago

Thanks for writing this up! Reading over this now though one idea that has stuck out to me is: what if instead of removing own/borrow and adding rc instead we "simply" removed borrow? This would slim down the set of handle types to exclusively (own $T). Some ways this might work are:

Would something like that end up working out? I'm not familiar with the underlying motivations for wasi-http and others to move away from own/borrow to rc, so this may also be an idea that's off-the-mark.

lukewagner commented 1 year ago

Starting with only own is an attractive alternative to consider since own has many nice performance and composability properties, so definitely happy to discuss that design space more.

The first issue is that, with only own, but not a borrow, all resource methods have to return an extra own result. I suppose, using the [method] annotation, bindings generators could hide this from developers. Also, technically, the return value could be an entirely different handle, and so method calls end up being in-place updates of the receiver. But maybe this is a fine Preview 2 limitation?

The second issue is how to handle cases like incoming-request-headers/incoming-response-headers where you're returning a child headers resource. Considering the options:

  1. Transfer ownership of the headers from the request to the caller (leaving the request headers field empty). With this option, you have to transfer ownership back to the request before sending the request out and I don't think bindings generation can do this automatically, so it would be rather verbose for client code.
  2. Alternatively, we could return a clone (which brings up copy-on-write O(1)-vs-O(n) implementation questions), but this still requires storing the headers back into the request after modification when you want to logically update the headers of a request.
  3. We could say that (own $Headers) really just owns a reference count of a shared internal headers resource (such that while two calls to incoming-request-headers will return two distinct owned handles, they indirectly point to the same headers, so that mutations on one show up in the other). This seems to have roughly the same properties as an rc handle (e.g., it would need the same caveats noted here), but more covertly and without the affordances for de-duping/identity.

Thinking about this more, one way to mitigate the verbosity of option 1 is to add [getter]/[setter] annotations (as alternatives to [method]), where the expectation is that a bindings generator would call the getter once the first time the headers "field" was accessed and then cache the resulting handle index on the wrapper object for the parent incoming-request. Then, the bindings code would reuse the same handle-index for future gets of the same field without making another host call. Of course the bindings generator would need to make sure that the cached headers was stored back into the parent incoming-request before it was transferred out, but I think the generator could reliably do this when generating the bindings for imports that use an incoming-request in the params or exports that use an incoming-request in the results. This does put a good bit more complexity on the bindings generator, which is why I didn't spend much time thinking about this option before, but considering it again, it does have the advantages of having a lot less magic in the host and likely better overall performance than with rc while still providing the basic expected deduping/identity of field access (via caching). This still leaves open harder identity questions (like what to do about DOM-like APIs where there are multiple non-field methods (like document.getElementById()) that can return handles to the same resource and where client code cares about identity, but perhaps those are just post-Preview-2 concerns.

WDYT?

alexcrichton commented 1 year ago

Ah that's a good point about methods, and the "natural" solution to that is to bring back borrow the way it is right now. Short of doing that though it's still perhaps possible to cover this up (sorta) with clone methods where with borrow you could do foo.bar() but with clone you'd have to do foo.clone().bar(). Methods though make me think that we shouldn't drop borrow...

For use cases which motivate rc I'm definitely thinking of your (3) above where the own is actually a reference-counted thing under the hood. I don't think it's worth trying to do copy-on-write and definitely not worth transferring ownership back and forth.

I suppose another way to phrase this is that I'm thinking about this from a Rust-like perspective. Rust only has ownership at the language level which is akin sort of to the component model only having ownership. At the runtime level of Rust, however, there is a Rc<T> pointer type which is akin to components implementation-wise having a reference count. That feels like it all lines up to me in that use cases which motivate reference counting can do something like this, and then perhaps even more nicely so in the future with rc<T> in the component model.

lukewagner commented 1 year ago

I got a bit excited about option (1) above, but thinking more about it, I remembered some other problems with the "caching getter" approach which had stopped me earlier (in particular, caching assumes the underlying value doesn't change), so that whole option probably isn't the right one.

Option (3) does increasingly seem like a good idea. Initially, I had assumed that the deduping behavior of rc was necessary to present a DOM-like wrapper object identity, which would need something like rc, but thinking about it more, it seems likely we could get away with not having that in the Preview 2 timeframe and there's even a new argument (talking through the JS bindings with @guybedford recently) for why perhaps fresh-wrapper-on-lowering is simply the right choice permanently (in particular, to avoid possible leaks in subtle cases involving expando properties and/or WeakMaps), and not just for JS, but for any language with wrapper objects with state/identity.

Incidentally, I think the post-Preview-2 solution for wasi-http is not an rc handle for headers but, rather, a handle scoped to the parent request/response that makes the corner case documented in #43 impossible (changing the "MUST NOT" to a trap).

So then, iiuc, if we go with Option (3) and keep borrow as you suggested, that means just keeping the proposal as is (closing the PR); is that right?

alexcrichton commented 1 year ago

Ok so, in summary, I think this boils down to "is DOM-like wrapper object identity important". To summarize both rc and own allow for shared resources that you can have multiple handles do: rc is naturally allowing that and with own it's possible by specifying a clone or similar method where the underlying data is reference counted. Thus the main tradeoff between these two approaches is that with rc guests are getting exactly the same integer handle allowing for wrapper object identity, but for own that would not be possible since new integer handles would be assigned.

I talked with @tschneidereit this morning and it sounds like y'all previously concluded that wrapper object identity is important. It sounds at least @guybedford may be leaning the other way now. To be clear though I don't personally mean to provide an opinion on this specific point since I don't fully understand it. I understand how it works and what it means for JS in a literal sense, but I don't have enough of a grasp on the downstream implications for developers and what pitfalls might be encountered and such to be able to weigh the importance of this point.

That all being said, there's two more things I've been thinking recently about this:

This sort of points me to rc instead of own and borrow, but I think there's still the question of whether the wrapper-object identity is necessary. If others agree though that wrapper-object identity is orthogonal to rc itself (e.g. can be added on), then one option is to go back to another option you brought up awhile back which is to consider wrapper-object identity as a canonical option (or something like that). That way we could leave the door open to implementing this in the future but perhaps take a more conservative approach in the preview2 time frame.

I realize my own thinking is sort of going back and forth on this, so sorry about that! I'm also happy to talk over video over this for more high-bandwidth discussion.

lukewagner commented 1 year ago

Heh, yeah, totally understand the flip-flopping on this; it's a nuanced topic. Also happy to jump into a higher-bandwidth chat any time.

I think Guy and I are probably thinking of the same concern. To explain it a bit more, the issue is: if I create an object wrapper of an rc, then add an "expando" property onto the wrapper or use the object wrapper as the key of the weak map, a very conservative binding would need to root the object wrapper (and thus keep the handle alive) because, even if it's not reachable from within the component, maybe in the future a same-identity rc would be returned from an import (or passed as the param of an export) and in this case a reasonable expectation is to see the same expando property or weak-map value (noting that, if this happens soon enough before a GC, you will see the expando, so collecting the wrapper would exhibit GC-dependent behavior, which is generally a Bad Thing). In browsers, this isn't an issue b/c the integrated JS+DOM GC can tell when the wrapped DOM object is also unreachable from the DOM side (and thus not possible to see again), but we intentionally don't have cross-component GC. Also, there's nothing JS-specific about this; I think any OOP-y binding would have the same issue. So after mulling this over for a bit, it seems like this is a pretty strong argument for not implicitly giving resources identity (at least, not without pretty strong direct feedback that it's needed).

If we don't end up ever adding identity, it's possible we'll never want rc and only allow duplication via explicit method in the interface (like you suggested above) which has the benefit of allowing an interface to opt-in to sharing and to define what sharing means in a domain-specific manner. In contrast, I think there's really strong reasons to have own at some point in time (for the perf and composability benefits). Thus, if we wanted to add one thing now that we're not going to take away in the future, I think that would be own.

But also, it's a good point that own does have extra bindings complexity when nested inside value types of import parameters. This may not be used in any Preview 2 WASI imported interfaces, though, so perhaps it could be left NYI in a first draft.

alexcrichton commented 1 year ago

Ok had a good talk with Luke this morning and our conclusion was that we'll probably want to stick with own and borrow for now. Given the recent thinking about how dedpuing isn't necessary and/or desired at this time that reduces the motivation for rc quite significantly. My point about complications in bindings generators with own seems surmountable and from a runtime/understanding perspective sticking with own and borrow provides a lot more guarantees around APIs than rc does (which is sort of an escape hatch). In that sense it seemed best to stay the course with own and borrow, but continuing to evaluate over time to see if tweaks are necessary or if it's not worth it and time to switch to rc (where one possibility is to "just" change the runtime semantics of own to be the rc, so reinterpreting own as rc or something like that)

lukewagner commented 1 year ago

Sounds good to me! Thanks for all the discussion and consideration on this, it definitely helped refine my understanding of the short- and long-term goal state here. I'll close now, but if anyone wants to continue discussing this approach, feel free to comment and reopen.