WebAssembly / component-model

Repository for design and specification of the Component Model
Other
897 stars 75 forks source link

Caller provided buffers question #369

Open badeend opened 2 weeks ago

badeend commented 2 weeks ago

Last week, an idea was presented by which callers could prevent unnecessary intermediate copies using the syntax:

read: func(length: u64) -> result<list<u8; ..length>,error-code>;

Is there any provision to make this work when the list is not directly present in the function result signature? For example when it is nested within a record type (along with other properties)

And even more challenging: what if the number of output lists is variable? Looking specifically at wasi:sockets.udp.incoming-datagram-stream::receive :innocent:

badeend commented 2 weeks ago

Is it possible to generalize the feature for any dynamically sized data structure? As opposed to only allowing it in places explicitly annotated as such (using list<T ,..size>).

If I understand the ABI correctly, a list is represented as a pointer+length tuple. So:

record IncomingDatagram {
    data: list<u8>,
    remote_address: u32,
}

record ReceiveResult {
    datagrams: list<IncomingDatagram>,
}

receive: func() -> ReceiveResult;

.. is roughly lowered into:

struct IncomingDatagram {
    data_ptr: void*,
    data_len: u32,
    remote_address: u32,
}

struct ReceiveResult {
    datagrams_ptr: void*,
    datagrams_len: u32,
}

fn receive(out: ReceiveResult*);

At the moment, the memory referenced by these _ptrs is always allocated using cabi.realloc.


My suggestion (if possible) is to allow callers to optionally prefill any of the _ptr & _len fields before making the call. The function implementation may take advantage of this by writing directly into those buffers and/or using the provided buffer size as an optimization hint. Even if the implementation wasn't written to be aware this then no-harm-no-foul: it will simply (keep) using dynamic allocation.

At all times, the caller must be prepared to find the _ptr field being overwritten to a dynamic allocation. Only if the _ptr was prefilled and hasn't been changed, is the _len guaranteed to be less-or-equal to the input value.

Example:

fn receive(out: ReceiveResult*) {

    let max_recv = if out.datagrams_ptr != NULL { out.datagrams_len } else { 16 };

    let native_dgrams = receive_the_native_datagrams(max_recv);
    let native_dgrams_len = native_dgrams.len()

    // Only use dynamic allocation if the caller did not provide a buffer or the provided buffer was too small.
    if out.datagrams_ptr == NULL || out.datagrams_len < native_dgrams_len {
        out.datagrams_ptr = cabi_realloc(native_dgrams_len * sizeof(IncomingDatagram));
    }
    out.datagrams_len = native_dgrams_len;

    for i in 0..native_dgrams_len {
        let native_dgram = native_dgrams[i];
        let out_dgram = out.datagrams_ptr[i];

        let data_len = native_dgram.data.len();

        // Only use dynamic allocation if the caller did not provide a buffer or the provided buffer was too small.
        if out_dgram.data_ptr == NULL || out_dgram.data_len < data_len {
            out_dgram.data_ptr = cabi_realloc(data_len);
        }
        out_dgram.data_len = data_len;

        memcpy(out_dgram.data_ptr, native_dgram.data);
        out_dgram.remote_address = native_dgram.remote_address;
    }
}

Expected benefits:

lukewagner commented 2 weeks ago

Is there any provision to make this work when the list is not directly present in the function result signature?

This is a good question. If we represent the n in func(n: u32) -> list<u8; ..n> with the parameter index of n (which would be analogous to how we handle abstract resource types), then we'd need the list typedef to be nested in a scope created by a new compound form of the function type constructor (similar to module, component and instance type constructors). But this would be rather awkward to represent syntactically in WIT. So instead I was thinking of n just being a string literal "free variable" that is bound once used within the context of a function type (with a validation rule to check that there are no remaining free variables in the constructed function type. In that case, you could write:

record r {
  bytes: list<u8; ..n>;
  other: string;
};
f: func(n: u32) -> r;

but if you renamed the parameter n to something else, it would be a validation error (complaining n was not bound to a parameter name).

And even more challenging: what if the number of output lists is variable?

For a function type such as func(n: u32) -> list<list<u8; ..n>>;, I was thinking we could either (1) fall back to the normal ABI for list<u8>, (2) reject in validation. The fun case is func(m: u32, n: u32) -> list<list<u8; ..m>, ..n> (and in general any nesting where all containing lists have fixed or bounded size), where caller-supplied buffers also make sense (and basically capture the iovec pattern). To incrementalize the implementation effort, though, I was thinking we might start by temporarily rejecting all such nested cases.

Is it possible to generalize the feature for any dynamically sized data structure? [...]

This is a great question and was the initial direction I was exploring too. The challenge here is with what happens in the case where there is a caller-supplied buffer, but it isn't big-enough (esp. in partial+nested situations), and how this plays out in the bindings generators, both reliably achieving the caller-supplied-buffer optimization while robustly handling the arbitrary-list-size case. By having a distinction in the type that rules out this awkward case, we can more reliably generate more simple, predictable bindings.

badeend commented 2 weeks ago

reliably achieving the caller-supplied-buffer optimization

Your proposal definitely has a leg up on my suggestion in that regard. At the same time, I wonder how much this matters in practice. Any self-respecting wasm engine would support caller-supplied buffers at the host side, so really it is only up to whether the guest supports it or not. If they do, they can be pretty confident the optimization will be used.

The challenge here is with what happens in the case where there is a caller-supplied buffer, but it isn't big-enough (esp. in partial+nested situations)

In the example above, I chose to fall back on dynamic allocation. The existing max-length-esque parameters could continue to exist to make sure this doesn't happen in practice.


I was thinking of n just being a string literal "free variable", [...] if you renamed the parameter n to something else, it would be a validation error

I personally would expect n to be defined in some kind of lexical scope. In this case, most naturally as a parameter of the type:

record r<n: u32> {
  bytes: list<u8; ..n>;
  other: string;
};
f: func(n: u32) -> r<n>;

(Yay, even more syntax... :smiling_face_with_tear:)

By having a distinction in the type that rules out this awkward case, we can more reliably generate more simple, predictable bindings.

One concern I have is that having distinct types could bifurcate/"color" the APIs. One variant that allocates, and another that doesn't. Or: one variant for "input" lists, another for "output" lists.

resource http-fields {
    get: func(name: string) -> list<u8>;
    // vs.
    get: func(name: string, maxlen: u32) -> list<u8, ..maxlen>;
}

The component-model is already on track to eradicate sync/async coloring (IMO, a much, much tougher problem). I hope we can do the same for user- vs. host-allocated buffers, and forgo the additional WIT complications entirely.

lukewagner commented 2 weeks ago

In the example above, I chose to fall back on dynamic allocation. The existing max-length-esque parameters could continue to exist to make sure this doesn't happen in practice.

The challenge is effectively representing this in the guest bindings. E.g., for func(n: u32) -> list<u8; ..n>, JS bindings could easily take a Uint8Array as an outparam, but if we have the "maybe reuse, maybe allocate" semantics, now we would need something much more complicated or else we'd give up entirely and just return a new array by value. And I don't this is JS-specific, I think this same tension arises in other languages too.

One concern I have is that having distinct types could bifurcate/"color" the APIs.

That's a good general instinct and goal, but I think in this case, we don't have the anti-composability effects caused by, e.g., function coloring. I think that's because list<u8; ..n> is just a specialization of list<T> (just like list<u8; 4>, as proposed in #304), in that it's just a list with extra guarantees. It's true that, in the short term, we'd probably want to introduce list<u8; ..n>-returning versions of existing list<u8>-returning functions, but that's just to maintain semver compatibility; in a breaking future version, we could keep only the list<u8; ..n> version.

Another key ingredient to the overall story that speaks to some of the non-parameter-bounded-length use cases I expect you're thinking about is the forthcoming streams. If we do it right, the ABI for a function returning a stream would allow you to pass a caller-supplied buffer in the initial call that returns a stream, saying "use as much of this as you can, but if it's not enough (or the data isn't ready), that's fine, tell me how much you were able to write and then return a handle to a stream that lets me consume the rest later. With that, in the cases where you couldn't return a parameter-bounded list because the list was fully dynamically sized, you'd probably want to return a stream instead (with the nice property that streams handle the very-big cases gracefully w/o OOM).

badeend commented 2 weeks ago

Hmm, ok. Fair points.

Different question then: is there a reason why the list length needs to be encoded in the WIT list type? From what you've described so far, it seems that from a technical point of view all we actually need to know from the WIT signature is whether the output list will be user-allocated or not. The actual allocated length can be passed at runtime.

Ie. if we were to define just one type, lets say list<T; user-allocated> (syntax TBD). Where user-allocated is a keyword, not a reference to a parameter. user-allocated lists continue being "just" a specialization of normal lists.

Avoids the dependent type parameter hazzle.

lukewagner commented 2 weeks ago

Also good question! At the Canonical ABI level, we ideally want to guarantee that, if the caller passes a buffer correctly-sized for n elements, that the callee cannot overflow that buffer by returning more than n elements -- instead this case should reliably trap, blaming the callee. That means we need to know the runtime value of n so that when the callee's returned list is lifted (from an i32 ptr+length pair), there is a trap if the length is greater than n. This also allows the caller to statically assume that the length of returned list is in fact less than n.

badeend commented 2 weeks ago

The exact same validation can still be done if the length was passed at runtime, right? The only change is that in that case there is effectively an implicit length parameter per provided buffer, instead one global one per function

lukewagner commented 2 weeks ago

Yeah, that's worth considering too. If we bring it back to what we write in WIT, if the WIT signature is just:

foo: func() -> list<u8, user-allocated>

then it seems a bit magical/surprising that the source-level signature contains an extra length parameter not declared here. Also, that length parameter isn't just "the length of the returned list", it's also just a straight-up parameter that you'd want to name and document just like any other ordinary parameter.

Also, over the longer term, explicitly-referencing parameters provides some extra expressivity, e.g.:

foo1: func(n: u32) -> tuple<list<u8; ..n>,list<u8; ..n>>;
foo2: func(m: u32, n: u32) -> tuple<list<u8; ..m>, list<u8; ..n>>;
foo3: func(m: u32, n: u32) -> list<list<u8; ..m>; ..n>;
badeend commented 1 week ago

I've given it some time, but I'm still not quite happy with what has been proposed so far (including my own proposals).

Overall, I can't help feeling uneasy about the dependent typing stuff:

poll: func(in: list<borrow<pollable>>) -> list<u32; ..(in.length)>;

but obviously this won't work.


I've looked at how existing programming languages handle this.

// C:
ssize_t read(uint8_t* dest_ptr, size_t dest_cap);

// Rust:
fn read(dest: &mut [u8]) -> Result<usize>;

// JavaScript:
function read(dest: Uint8Array): number;

// Java:
int read(byte[] dest);

// C#:
int Read(Span<byte> dest);

// Go:
func Read(dest []byte) (len int, error);

Some observations:

it seems a bit magical/surprising that the source-level signature contains an extra length parameter not declared here. Also, that length parameter isn't just "the length of the returned list", it's also just a straight-up parameter that you'd want to name and document just like any other ordinary parameter.

Fair point. I'd argue the same argument equally applies to the buffer's ptr (?). The function implementation needs them both in order to write into the destination buffer.


After seeing how the other languages deal with it and what the WIT eventually will desugar into, I came up with another idea. To be clear this is mostly the same as what has been discussed before, but interpreted from another angle:

What if we allow value types to be borrowed? And first and foremost: allow lists to be borrowed.

Conceptually, a caller-provided buffer is a special kind of temporary resource. The memory backed by the buffer is only valid and accessible by the callee during the function invocation, just like a regular resource borrow.

E.g.:

read: func(dest: borrow<list<u8>>) -> result<_, error-code>;

Regular lists are encoded as a (ptr+len) pair. Borrowed mutable lists will then be a (ptr+len+capacity) triple. Before calling the caller must set the ptr & capacity to the proper values and len to 0.

This has the same benefits as before:

as well as: all inputs now being explicitly captured in the parameter list.

Binding generators for high-level languages that don't care about allocation can replace any borrow<list<...>> in the signature with just an u32 that represents the capacity, and heap-allocate the buffers transparently.


Whoops If you're with me so far; we can step it up a notch by keeping the borrowed outparam semantics _and also_ syntactically flip the just introduced `dest: borrow>` parameter back to the other side of the arrow: by realizing that _if_ we can borrow value types, _every_ return type can be seen as just a specialized form of a borrowed outparam. (it already is at the ABI level). ```wit // I.e. this: local-address: func() -> result; // is conceptually the same as this: local-address: func(result: borrow>); ``` Moving stuff the other way around should work equally well, so: ``` read: func(dest: borrow>); ``` can be swapped back into the signature we want: ``` read: func() -> borrow>; ``` In low-level languages, the bindings should generate an additional 'output' parameter for these borrows. In dynamic languages that don't care about allocation, the signature can stay as it is. I realize this a bit of a mental 360 only to end up _almost_ back to where I started. Hope it makes sense.

Edit: scratch that for now

lukewagner commented 1 week ago

Thanks for all the consideration exploring the design space here. I think you're right that the common idom for expressing caller-supplied buffers is some sort of outparam that pairs the requested length with the buffer, so it's definitely worth exploring that area of the design space.

This is only partially a nit, but I think borrow and resources aren't quite the right starting point to build on since, with a borrowed handle, I'm passing in a valid resource that the caller receives and can use during the call, whereas with our case here, we're not passing in a list<u8> that the callee can see (before writing into). So let's say that instead we use a different word altogether, like out (e.g., so I could write read: func(dest: out<list<u8>>) -> result).

The next issue I see is that it's unclear why the combination of out<...> with list<u8> means that I pass in a length and get back a list of exactly that length. E.g., let's say I wrote read: func(dest: out<tuple<u32, u32>>), there's no length there. Thus, the incoming length is due to some special combination of out<...> when applied to list<...>, which suggests that you don't want out<...> as a general type constructor but, rather, something fused with lists, e.g. buffer (so I could write read: func(dest: buffer<u8>) -> result).

Now that starts to look pretty attractive, so let's compare:

read: func(n: u32) -> result<list<u8; ..n>>;

with

read: func(dest: buffer<u8>) -> result;

The stumbling point for me when I was considering this sort of option earlier concerns the various modes of failure. In these baby examples it's obvious what "should" happen, but that depends on special-casing the meaning of the single top-level result return type's error case. But what about:

When one considers various solutions to this, I found that they all ended up very special-case-y compared to putting the bounded-size list in the return type where the list works just like a normal value, and the fact that I'm passing a caller-supplied buffer is merely an ABI optimization to avoid a realloc call.

Also, here's a more hand-wavy reason to prefer the former. If you think of 2 components as 2 separate processes and diagram the actual data flow between them, it looks like this (it's Mermaid time!):

sequenceDiagram
    Caller ->>+ Callee: n
    Callee -->>- Caller: bytes

In particular, in a component (as opposed to a raw .dll), Caller never actually passes the buffer to Callee, only the number n. Thus, the latter function signature paints a sortof inaccurate picture of what's happening from a cross-component perspective. It's even more clear if you imagine Caller and Callee are on different machines (which isn't a thing that we should ever be doing automatically (like CORBA did), but certainly there are some cases where it's appropriate): in this case, the caller-supplied buffer never leaves Caller's machine; it's simply an ABI detail for "where to put bytes when they come back.

Thus, for both the practical error-handling issues and the high-level "what's happening at the component level" reasons, I still prefer the former.

As for the concerns you mentioned:

lukewagner commented 4 days ago

Thinking some more about the problems I mentioned above with "buffer" and exploring the design space a bit with @sunfishcode, there may actually be some good solutions. What I like about this whole approach is, as you said, it tends to map directly to what you want to see in the source bindings, and it's also somewhat more expressive (particularly in lists-of-buffers cases like writev/readv). More thought is necessary to explore the design, but I just wanted to point out that it's seeming more feasible now than I initially thought in my last comment.