Closed lukewagner closed 1 year ago
Thanks for writing this up! Reading over this now though one idea that has stuck out to me is: what if instead of removing own/borrow and adding rc instead we "simply" removed borrow? This would slim down the set of handle types to exclusively (own $T)
. Some ways this might work are:
own
is easier to manage/implement than rc
(I was a bit surprised that rc
actually implies two levels of reference counting, and the de-duping behavior isn't necessarily trivial here).clone
method which returns another owned resource. Similar to dup
on file descriptors on Linux, but per-type instead of for all resources.(rc $T)
type (as specified here, just postponing it for later)Would something like that end up working out? I'm not familiar with the underlying motivations for wasi-http
and others to move away from own
/borrow
to rc
, so this may also be an idea that's off-the-mark.
Starting with only own
is an attractive alternative to consider since own
has many nice performance and composability properties, so definitely happy to discuss that design space more.
The first issue is that, with only own
, but not a borrow
, all resource methods have to return an extra own
result. I suppose, using the [method]
annotation, bindings generators could hide this from developers. Also, technically, the return value could be an entirely different handle, and so method calls end up being in-place updates of the receiver. But maybe this is a fine Preview 2 limitation?
The second issue is how to handle cases like incoming-request-headers
/incoming-response-headers
where you're returning a child headers
resource. Considering the options:
(own $Headers)
really just owns a reference count of a shared internal headers resource (such that while two calls to incoming-request-headers
will return two distinct owned handles, they indirectly point to the same headers, so that mutations on one show up in the other). This seems to have roughly the same properties as an rc
handle (e.g., it would need the same caveats noted here), but more covertly and without the affordances for de-duping/identity.Thinking about this more, one way to mitigate the verbosity of option 1 is to add [getter]
/[setter]
annotations (as alternatives to [method]
), where the expectation is that a bindings generator would call the getter once the first time the headers
"field" was accessed and then cache the resulting handle index on the wrapper object for the parent incoming-request
. Then, the bindings code would reuse the same handle-index for future gets of the same field without making another host call. Of course the bindings generator would need to make sure that the cached headers
was stored back into the parent incoming-request
before it was transferred out, but I think the generator could reliably do this when generating the bindings for imports that use an incoming-request
in the params or exports that use an incoming-request
in the results. This does put a good bit more complexity on the bindings generator, which is why I didn't spend much time thinking about this option before, but considering it again, it does have the advantages of having a lot less magic in the host and likely better overall performance than with rc
while still providing the basic expected deduping/identity of field access (via caching). This still leaves open harder identity questions (like what to do about DOM-like APIs where there are multiple non-field methods (like document.getElementById()
) that can return handles to the same resource and where client code cares about identity, but perhaps those are just post-Preview-2 concerns.
WDYT?
Ah that's a good point about methods, and the "natural" solution to that is to bring back borrow
the way it is right now. Short of doing that though it's still perhaps possible to cover this up (sorta) with clone
methods where with borrow
you could do foo.bar()
but with clone
you'd have to do foo.clone().bar()
. Methods though make me think that we shouldn't drop borrow
...
For use cases which motivate rc
I'm definitely thinking of your (3) above where the own
is actually a reference-counted thing under the hood. I don't think it's worth trying to do copy-on-write and definitely not worth transferring ownership back and forth.
I suppose another way to phrase this is that I'm thinking about this from a Rust-like perspective. Rust only has ownership at the language level which is akin sort of to the component model only having ownership. At the runtime level of Rust, however, there is a Rc<T>
pointer type which is akin to components implementation-wise having a reference count. That feels like it all lines up to me in that use cases which motivate reference counting can do something like this, and then perhaps even more nicely so in the future with rc<T>
in the component model.
I got a bit excited about option (1) above, but thinking more about it, I remembered some other problems with the "caching getter" approach which had stopped me earlier (in particular, caching assumes the underlying value doesn't change), so that whole option probably isn't the right one.
Option (3) does increasingly seem like a good idea. Initially, I had assumed that the deduping behavior of rc
was necessary to present a DOM-like wrapper object identity, which would need something like rc
, but thinking about it more, it seems likely we could get away with not having that in the Preview 2 timeframe and there's even a new argument (talking through the JS bindings with @guybedford recently) for why perhaps fresh-wrapper-on-lowering is simply the right choice permanently (in particular, to avoid possible leaks in subtle cases involving expando properties and/or WeakMap
s), and not just for JS, but for any language with wrapper objects with state/identity.
Incidentally, I think the post-Preview-2 solution for wasi-http is not an rc
handle for headers
but, rather, a handle scoped to the parent request/response that makes the corner case documented in #43 impossible (changing the "MUST NOT" to a trap).
So then, iiuc, if we go with Option (3) and keep borrow
as you suggested, that means just keeping the proposal as is (closing the PR); is that right?
Ok so, in summary, I think this boils down to "is DOM-like wrapper object identity important". To summarize both rc
and own
allow for shared resources that you can have multiple handles do: rc
is naturally allowing that and with own
it's possible by specifying a clone
or similar method where the underlying data is reference counted. Thus the main tradeoff between these two approaches is that with rc
guests are getting exactly the same integer handle allowing for wrapper object identity, but for own
that would not be possible since new integer handles would be assigned.
I talked with @tschneidereit this morning and it sounds like y'all previously concluded that wrapper object identity is important. It sounds at least @guybedford may be leaning the other way now. To be clear though I don't personally mean to provide an opinion on this specific point since I don't fully understand it. I understand how it works and what it means for JS in a literal sense, but I don't have enough of a grasp on the downstream implications for developers and what pitfalls might be encountered and such to be able to weigh the importance of this point.
That all being said, there's two more things I've been thinking recently about this:
A point against having own
as a type is that it's adding a style of ownership transfer which is not present otherwise in the component model currently, primarily that the arguments to an imported function into a guest are consuming ownership. For example today if you pass a string to an imported function the guest still holds onto the memory and has access to it after the function call finishes. For own
, however, you'd no longer have access in the guest. This poses interesting questions for bindings generators because for example if you had something like:
resource foo
record bar {
a: own<foo>,
b: string,
}
baz: func(x: bar)
then what do the bindings for baz
look like? The bar
record needs to simultaneously reflect that it's consuming ownership of a resource while also not consuming ownership of a string. In Rust for example that would looke like:
struct Bar<'a> {
a: Foo,
b: &'a str,
}
but this is all highly specific to just baz
itself. If baz
were an exported function then its b
field would instead be String
since it's receiving ownership of a string (an allocated chunk of memory).
Basically what I'm getting at is that bindings generation for own
as an argument to an imported function is not going to be trivial and doesn't easily fit into the bindings generation story because you're losing access to something which is currently never the case in arguments to imports. Not to say this is insurmountable of course, but is a point in favor of rc<T>
where you never lose access to your own handle until you destroy it.
The question of wrapper-object identity I think is actually somewhat orthogonal to own
-vs-rc
. For example rc
could be implemented in a more "file descriptor" like fashion where there's only one level of reference counting on the underlying resource. Basically imagine what I was talking about own
above except every single resource implicitly had a clone
method due to the nature of rc
.
This sort of points me to rc
instead of own
and borrow
, but I think there's still the question of whether the wrapper-object identity is necessary. If others agree though that wrapper-object identity is orthogonal to rc
itself (e.g. can be added on), then one option is to go back to another option you brought up awhile back which is to consider wrapper-object identity as a canonical option (or something like that). That way we could leave the door open to implementing this in the future but perhaps take a more conservative approach in the preview2 time frame.
I realize my own thinking is sort of going back and forth on this, so sorry about that! I'm also happy to talk over video over this for more high-bandwidth discussion.
Heh, yeah, totally understand the flip-flopping on this; it's a nuanced topic. Also happy to jump into a higher-bandwidth chat any time.
I think Guy and I are probably thinking of the same concern. To explain it a bit more, the issue is: if I create an object wrapper of an rc
, then add an "expando" property onto the wrapper or use the object wrapper as the key of the weak map, a very conservative binding would need to root the object wrapper (and thus keep the handle alive) because, even if it's not reachable from within the component, maybe in the future a same-identity rc
would be returned from an import (or passed as the param of an export) and in this case a reasonable expectation is to see the same expando property or weak-map value (noting that, if this happens soon enough before a GC, you will see the expando, so collecting the wrapper would exhibit GC-dependent behavior, which is generally a Bad Thing). In browsers, this isn't an issue b/c the integrated JS+DOM GC can tell when the wrapped DOM object is also unreachable from the DOM side (and thus not possible to see again), but we intentionally don't have cross-component GC. Also, there's nothing JS-specific about this; I think any OOP-y binding would have the same issue. So after mulling this over for a bit, it seems like this is a pretty strong argument for not implicitly giving resources identity (at least, not without pretty strong direct feedback that it's needed).
If we don't end up ever adding identity, it's possible we'll never want rc
and only allow duplication via explicit method in the interface (like you suggested above) which has the benefit of allowing an interface to opt-in to sharing and to define what sharing means in a domain-specific manner. In contrast, I think there's really strong reasons to have own
at some point in time (for the perf and composability benefits). Thus, if we wanted to add one thing now that we're not going to take away in the future, I think that would be own
.
But also, it's a good point that own
does have extra bindings complexity when nested inside value types of import parameters. This may not be used in any Preview 2 WASI imported interfaces, though, so perhaps it could be left NYI in a first draft.
Ok had a good talk with Luke this morning and our conclusion was that we'll probably want to stick with own
and borrow
for now. Given the recent thinking about how dedpuing isn't necessary and/or desired at this time that reduces the motivation for rc
quite significantly. My point about complications in bindings generators with own
seems surmountable and from a runtime/understanding perspective sticking with own
and borrow
provides a lot more guarantees around APIs than rc
does (which is sort of an escape hatch). In that sense it seemed best to stay the course with own
and borrow
, but continuing to evaluate over time to see if tweaks are necessary or if it's not worth it and time to switch to rc
(where one possibility is to "just" change the runtime semantics of own
to be the rc
, so reinterpreting own
as rc
or something like that)
Sounds good to me! Thanks for all the discussion and consideration on this, it definitely helped refine my understanding of the short- and long-term goal state here. I'll close now, but if anyone wants to continue discussing this approach, feel free to comment and reopen.
The current explainer has two handle types (
own
andborrow
). Unfortunately, to describe realistic interfaces (such as WASI HTTP), additional handle types are needed (forown
to actually mean (transitive) exclusive ownership). I was working on this in the child-handles branch before realizing the need for more varieties and thus a potentially better, more-general, more-orthogonal design (currently sketched in the generalize-handles) branch. In the meantime, we need to make progress on Preview 2, so for now (and perhaps until after Preview 2), I think it'd be a good idea to have a single handle type that represents non-exclusive, reference-counted ownership, since this is able to express any interface honestly (albeit sometimes imprecisely and suboptimally). In the future, when we add back more-precise handle types (capturing ownership, exclusivity and scope in the call contract), this non-exclusive, reference-counted handle would continue to exist as one of the options, so Wit and components written using it would continue to be valid.There is the question of what to call this reference-counted handle type (in Wat and Wit).
shared
orref
would make sense, but these names are already in use by core wasm and so it's probably a good idea to avoid creating confusion. The best I could think of (that isn't super long) isrc
(standing for "reference counted"), so that's what in the PR, but happy to hear better options. E.g., in Wit, this would look likerc<file>
(wherefile
is a resource type). This reserves use of the resource-type's name (without qualifier) for the unique/linear handle case, which seems like the right default option to suggest to interface authors.A new issue that needs to be addressed for
rc
(and non-exclusive handles in general) is how to handle identity (e.g., when tworc
handles to the same resource are given to a component). Guest programs may need to ask whether two handles point to the same resource and/or use a resource as a key in a hash-table. This can be solved in an ad hoc way on a per-interface basis (e.g., adding expliciteq
andhash-code
/id
methods to resources), but (1) this is opaque to bindings generation, (2)hash-code
/id
raise various virtualization hazards, (3) it requires the interface author to anticipate usage patterns. Instead, this PR proposes that the Canonical ABI lowering automatically de-duplicates multiplerc
handles pointing to the same resource, ultimately giving them the samei32
handle index, thus allowing thei32
index to serve as the (component-local) identity of the resource for use in bindings generators.