Supporting caller-allocated, callee-written buffer without views

lukewagner commented 5 years ago

For an API like the C standard library read:

ssize_t read(int fildes, void *buf, size_t nbyte);

where the caller allocates a buffer which the callee (partially) fills in, to express the signature in wasm, ideally we want to avoid buf being an i32 argument since (1) this implies linear memory (preventing GC buffers), (2) it isn't multi-memory future-compatible.

One solution is to have read take a "slice"/"view" parameter (analogous to a Web IDL Uint8Array). However, views on linear memory somewhat break encapsulation, by giving the callee a permanent window into the caller's linear memory, leading to use-after-free types of bugs, so I think it's worthwhile to ask whether we can support use cases like read with value/copy semantics, without loss of performance.

Setting aside the issue of how to signal failure (probably via variant/option return type), I think a natural interface-typed signature would be:

(func (export "read") (param $fildes (ref $FD)) (param $nbyte u32) (result (list u8)))

where the API's contract is that the returned list's length is <= $nbyte. The question is how we can have this interface, but achieve the same performance, in particular, allowing the caller to supply a fixed-size buffer.

One idea would be to introduce a variant of the *-to-memory instructions which, instead of calling an exported function to allocate linear memory, would take a (pointer, length) i32 pair as operands from the stack (trapping if the required space is greater than the length). E.g.:

(func (import "" "read_") (param (ref $FD) i32 i32) (result i32))
(@interface func $read (import "libc" "read") (param (ref $FD) i32) (result (list u8)))
(@interface implement (import "" " read_")
    (param $fd (ref $FD)) (param $nbytes i32) (param $ptr i32)
    (result i32)
  arg.get $fd
  arg.get $nbytes
  call-import $read
  arg.get $ptr
  arg.get $nbytes
  list-to-preallocated-memory  ;; [list i32 i32] -> [i32], returning actual list size
)

However, there will be many *-to-memory instructions, and it feels wrong to have to add a *-to-preallocated-memory variant for each.

An alternative is to generalize the *-to-memory functions to, instead of only calling exports, to also be able to call any helper function (as described in #65), and for it to be possible for helper functions to be defined inline as unnamed (lambda) functions, so that they can reference the enclosing scope. This would allow the above example to be equivalently expressed as:

(func (import "" "read_") (param (ref $FD) i32 i32) (result i32))
(@interface func $read (import "libc" "read") (param (ref $FD) i32) (result (list u8)))
(@interface implement (import "" " read_")
    (param $fd (ref $FD)) (param $nbytes i32) (param $ptr i32)
    (result i32)
  arg.get $fd
  arg.get $nbytes
  call-import $read
  list-to-memory (func (param $needed i32) (result i32)
    ;; maybe assert $needed <= $nbytes
    arg.get $ptr
  )  ;; [list] -> [i32], returning actual list size
)

This generalization would also allow interesting hybrid schemes, e.g., wherein a caller-supplied buffer was used opportunistically, falling back to malloc in too-big cases, which is another common C++ optimization pattern.

With the understanding that helper functions are always designed to be inlined at the callsite (which is always statically determinable), then these lambda functions should compile into direct stack access, with no worse perf than list-to-preallocated-memory.

Thoughts?

fitzgen commented 5 years ago

giving the callee a permanent window into the caller's linear memory,

Views and ArrayBuffers are already detachable -- perhaps we can reuse this mechanism to avoid making the views permanent? (They will already be detached on memory growth, which is something we probably also need to think about, but which I haven't done yet).

Regarding list-to-memory and its proposed lambda: I want to make sure I am understanding this right, the lambda is mapping a list length to a pointer? To a byte length? Is maybe the generic list-to-memory version missing the arg.get $ptr instruction that exists in the non-generic list-to-pre-allocated-memory version?

lukewagner commented 5 years ago

ArrayBuffers and typed array views are, but not necessarily a future sliceref that gets added as a first-class wasm reference type from which bytes can be loaded and stored. The cost of detachability is constantly needing to probe "is the underlying buffer detached?".

Oh whoops, sorry, I wrote the wrong arg, it should've been arg.get $ptr as you said.

fitzgen commented 5 years ago

Ok so just to be 100% clear: in list-to-memory f, f is called just once and maps a byte length to a pointer (possibly pre-allocated, possibly malloced) where the bytes can be copied into?

lukewagner commented 5 years ago

Yup! That's the idea at least. Currently in the explainer, string-to-memory takes an f that maps byte length to a pointer, but the f is restricted to be a wasm exported function name, so you could consider this to be a generalization of that.

lukewagner commented 4 years ago

I realized a problem with the above approach: if what we're trying to do is optimize calls into a native runtime (e.g., a native impl of the read() call mentioned above), it relies on some rather magic compiler/runtime optimizations to effectively hoist the list-to-memory so that it happens during the native call (the point at which the native code needs to write its outgoing bytes into something). This seems like a complicated, fragile and incomplete optimization (b/c any intervening effect can break it).

Thinking once again about the "slices" approach: one option is to say that the slice interface type is never meant to flow out to core wasm as a first-class sliceref value; but, rather, slices are meant to stay encapsulated within adapter code (just like string) which forces them to only be written into during the adapted call, no different than other interface types.

Now there is still the problem that if the other side of a slice-taking call is JS, that JS gets a persistent typed array view, but maybe that's just "ok"; as long as the long-term shared-nothing wasm-to-wasm precedent is established...

fgmccabe commented 4 years ago

There is a use case that does not seem to be addressed by this: using buffers to communicate between modules outside the call to set it up. This seems to be important for graphics buffers where one module is writing to the shared buffer and another reading from it.

lukewagner commented 4 years ago

True, but I think that use case is best addressed by either multi-memory + memory imports or, to support dynamic acquisition of buffers, a first-class core-wasm "slice reference" that can be used as a dynamic operand to the load/store ops (which will likely be added to wasm at some point in time). I think the only thing such a use case asks from interface types is that interface types not require a shared-nothing boundary (thereby disallowing the memory import or slice reference), which is currently the plan.

WebAssembly / interface-types

Supporting caller-allocated, callee-written buffer without views #68