WebAssembly / component-model

Repository for design and specification of the Component Model
Other
982 stars 82 forks source link

Async vs. borrows #171

Open badeend opened 1 year ago

badeend commented 1 year ago

I'd like your thoughts on an issue I ran into while designing wasi-sockets:

Some operations require exclusive access to a resource while the operation is in progress. The component-model already prohibits multithreaded access by default, so there is no problem so long as the function completes synchronously.

However, if the function needs to perform IO during its operation, the component-model's protections are not enough. Because, while parallel access is disallowed, concurrency in general isn't.

In the wasi-sockets proposal specifically, I run into this issue with mutating & async methods like connect. Simplified example:

set-ttl: func(this: borrow<tcp-socket>, ttl: u32) -> my-result
connect: func(this: borrow<tcp-socket>) -> future<my-result>

Here, connect is async and both set-ttl and connect change the socket state. Calling set-ttl while connect is in progress should not be possible.

At the moment, I've introduced a distinct error code for this scenario and plastered that all over the API's. As can be seen here: https://github.com/badeend/wasi-sockets/blob/error-codes/wit/tcp.wit (search for concurrency-conflict) Now, this can work fine as it is. I was just wondering if the component-model could be of any help here.

IIUC, the component model can already solve this issue (ableit very hacky) by changing the connect signature to swallow a unique reference and spit it back out at the end of the async operation:

connect: func(this: owned<tcp-socket>) -> future<tuple<owned<tcp-socket>, my-result>>

It's just not very ergonomic. And the "same" resource would probably get a new index after the future resolves.

lukewagner commented 1 year ago

I'm so glad you filed this issue; indeed I've been thinking about a possible new handle type that is motivated by exactly the situation you're describing here, but I wasn't quite sure whether it would be valuable in practice.

So with borrow handles, the caller keeps their original handle fully functional during the call, while with own handles, the caller permanently loses their handle when starting the call. The idea would be to add a new handle type, I'll call it exclusive-borrow (more-succinct names welcome), that is sortof like a mix of the two: like own, the caller loses the handle when the call starts, but the caller regains access to the handle when the async call ends. So it's rather like &mut in Rust (where the property of interest here is uniqueness, not mutability, which wasm has no good way of expressing).

Note that defining an exclusive-borrow type requires a clear definition of what the lifetime of an "async call" is which in turn depends on this "structured concurrency" property that we're going after with the async proposal. Thus, the time to consider adding exclusive-borrow would be in the Preview3 timeframe in conjunction with future/stream.

badeend commented 1 year ago

The time to consider adding exclusive-borrow would be in the Preview3 timeframe in conjunction with future/stream.

Makes sense!👍

The idea would be to add a new handle type, I'll call it exclusive-borrow (more-succinct names welcome), that is sortof like a mix of the two

Question: Don't borrows always promise exclusive access? In my mind the differentiating feature of this new borrow type is its duration, not its exclusive'ness. Maybe async-borrow ?

defining an exclusive-borrow type requires a clear definition of what the lifetime of an "async call" is

If this state machine diagram is still up to date I would say that, from the caller's point-of-view, the async call is "done" when the task transitions from returning to finishing. Any clean-up/finalization code that the Task performs in the background after returning it's final value is not under protection of it's initial borrow.

lukewagner commented 1 year ago

Question: Don't borrows always promise exclusive access?

No, as currently proposed, a borrow handle just copies the original handle, leaving it fully intact and usable while the borrow exists.

badeend commented 1 year ago

I might be missing something fundamental here, but:

You mention the original handle is left intact and usable over the duration of the call. I can see how the handle would remain "intact" in the caller's resource table, but how can it still be "usable"? Wouldn't any attempt to actually use the handle run into the parallelism/reentrancy limitations?

lukewagner commented 1 year ago

That's a good question. Imagine a resource with an async method that performs a streaming read (e.g., of a file). After calling this async method, I should be able to do other async read operations while the read is in progress, such as other streaming reads. In Rust, such an async method would take a non-mutable borrow, allowing the caller to continue to make other concurrent read-only calls. Unfortunately, we don't have any good way to reflect mutability in a component-level signature (practically every call is going to mutate linear memory, even if it's logically read-only), so we can't say/enforce immutability, but the reason for having handles stay usable during an async call is basically the same as for non-mutable borrows. Moreover, some resources (such as files) may even need to support concurrent async reads and writes to the same resource (and say what happens when there is concurrent overlapping reads/writes).

badeend commented 1 year ago

Unfortunately, we don't have any good way to reflect mutability in a component-level signature

I'm fine with focusing on "exclusivity" instead.

How I understand it now:

  1. There are two orthogonal aspects:
    • Duration:
      • "Sync" borrows must be dropped before the synchronous return. Even if the returned future/stream hasn't completed yet.
      • "Async" borrows are alive even after the immediate function returns, but must be dropped before the returned future/stream completes. (or whatever the exact semantics will end up being)
    • Exclusivity:
      • "Shared" handles can used concurrently.
      • "Exclusive" handles can not be used concurrently. Additionally, exclusive handles can not used while there are any "shared" borrows in use.
  2. In an earlier comment I mentioned that borrow is effectively "Exclusive". However, this is only a side effect of the current parallelism/reentrancy limitations and lack of async support. These limitations might be lifted in the future.

Currently, borrow is "Sync". When futures&streams land, do you think borrows will be "Async" by default?

lukewagner commented 1 year ago

Thanks writing that outline, that's a very helpful framing of the question. My understanding is that the Sync-vs-Async aspect is a property of the function (type): functions returning a future/stream anywhere in the result type are Async (and thus hold onto borrows until the async call is complete) whereas functions without future/stream (i.e., all functions today) are Sync (and thus release the borrow upon return). In contrast, the Exclusive-vs-Shared aspect is a property of the individual handle type, where the borrow as currently proposed is non-exclusive (in both Sync and Async functions).

I expect the source of confusion here is that it initially seems like borrow types are Exclusive in today's Sync-only setting. This is almost true, but there are two exceptions where the non-exclusivity of borrow is observable with the current proposed semantics:

  1. If I have a single handle in my handle table and I pass the same handle-index for two separate borrow parameter values, this is currently proposed to succeed. If borrow were exclusive, the first lifting would set the handle-table-entry to be invalid such that second lifting would trap.
  2. When a parent component instantiates a child component, the parent component can supply its own core functions as the child's imports. If the parent component calls into the child via child export and passes in a borrow of a handle in the parent's handle table and the child then calls back into the parent via import, the parent can observe that the borrowed handle is still usable.

While both seems like corner cases where we could probably get away with switching to exclusive semantics, the issue is that this means that we're going to have tons of components that are expecting exclusive borrow handles despite the fact that they probably don't really require exclusivity, which will become a problem once we start wanting to implement functions taking non-exclusive handles (since now they can't call most functions, and not for a good reason). In Rust terms, it would be like if &mut was the default unless you opted out.

dannypsnl commented 1 year ago

I think this is exactly the limit of linear logic? What I'm thinking is more like, how would component-model work on the system like session type (concurrent system)?

lukewagner commented 1 year ago

Session types are cool and a complex topic, but roughly what I was thinking is that you could think of these types we're defining in the component model as elaborating down into session types in a formal core calculus.

dannypsnl commented 1 year ago

I have some rough idea in mind recently, I will quickly elaborate them

  1. there have many slices (e.g. a thread, a process) in concurrency system
  2. all of them have no idea what will others do, so generally there has no statically type system in this sense, unless we're encoding communication into it (that's session type)

but resource stands in an interesting place, the implementation of component model actually has chance to provide an event system to trigger runtime recycling for resources. For example, use a CRDT to ensure if we get all operations, the state is same, then resource can clean itself up. Or still a local linear type, such that fork out a slice to handle recycling.