Add owned version of outgoing "copy" and "utf8-*" bindings

alexcrichton commented 5 years ago

Currently if you're passing an array of data from WebAssembly to JS (such as strings or a list of bytes) you have the option of using outgoing bindings like copy, utf8-*, or view. In some cases though what happens is that the WebAssembly computes a value (e.g. renders the input as markdown) and then wants to return the computed value. In this scenario though currently WebIDL bindings don't provide a great way to manage this.

The WebAssembly module needs to return the pointer/length to JS, and then after JS has copied it to its own heap (e.g. via TextDecoder or copying a typed array out) then the original allocation in the WebAssembly needs to be deallocated.

Currently tools like wasm-bindgen work with this by indeeding having JS perform the deallocation, but it means that Rust-defined functions which return a string can't use vanilla WebIDL bindings and still require JS shims.

Would it be possible to add a new outgoing binding which copies the data out, but also has a free function listed to deallocate the data after JS has read it?

fgmccabe commented 5 years ago

What is the difference between the host invoking a 'free' callback and you calling it after the JS call?

On Tue, Jun 25, 2019 at 5:54 AM Alex Crichton notifications@github.com wrote:

Currently if you're passing an array of data from WebAssembly to JS (such as strings or a list of bytes) you have the option of using outgoing bindings like copy, utf8-*, or view. In some cases though what happens is that the WebAssembly computes a value (e.g. renders the input as markdown) and then wants to return the computed value. In this scenario though currently WebIDL bindings don't provide a great way to manage this.

The WebAssembly module needs to return the pointer/length to JS, and then after JS has copied it to its own heap (e.g. via TextDecoder or copying a typed array out) then the original allocation in the WebAssembly needs to be deallocated.

Currently tools like wasm-bindgen work with this by indeeding having JS perform the deallocation, but it means that Rust-defined functions which return a string can't use vanilla WebIDL bindings and still require JS shims.

Would it be possible to add a new outgoing binding which copies the data out, but also has a free function listed to deallocate the data after JS has read it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/webidl-bindings/issues/42?email_source=notifications&email_token=AAQAXUENVHI5XVWDG5PCZ2TP4IIQXA5CNFSM4H3H73B2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G3RIFTQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQAXUCCFQNEJAYN7VB7ORTP4IIQXANCNFSM4H3H73BQ .

-- Francis McCabe SWE

alexcrichton commented 5 years ago

If you're calling a JS imported function then you definitely have an opportunity to free once that returns, but the case that more worries me is that JS calls wasm which returns a string, so all wasm could do is maybe append it to a global list of "things to free" and periodically try to sweep the list once control comes back to wasm.

fgmccabe commented 5 years ago

There is a 'permanent' issue when it comes to JS calling WASM and it returning a string. The WASM string lives in untrusted memory ... On the other hand, there is ultimately room for a rich collection of coercion operators in the bindings layer.

On Tue, Jun 25, 2019 at 11:28 PM Alex Crichton notifications@github.com wrote:

If you're calling a JS imported function then you definitely have an opportunity to free once that returns, but the case that more worries me is that JS calls wasm which returns a string, so all wasm could do is maybe append it to a global list of "things to free" and periodically try to sweep the list once control comes back to wasm.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/webidl-bindings/issues/42?email_source=notifications&email_token=AAQAXUAEFVS6LHH5PRVHBUTP4MD7XA5CNFSM4H3H73B2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYSPILI#issuecomment-505738285, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQAXUB4QULZD5KTWVCKYJDP4MD7XANCNFSM4H3H73BQ .

-- Francis McCabe SWE

rossberg commented 5 years ago

I would argue that that the necessity to add a rich collection of binding operators would be a failure mode.

fgmccabe commented 5 years ago

That's a question of taste. On the other hand, there is also a dilemma: especially with the coercion expressions being declarative: a simpler compositional framework with fewer operators will make the work of the embedded harder. A richer set of operators (with some implied redundancy) will look uglier but make the work of the embedder easier. We should definitely start with a simple compositional minimal set of operators. (reorder adjectives to taste). In my opinion, a declarative approach is not negotiable.

On Wed, Jun 26, 2019 at 9:06 AM Andreas Rossberg notifications@github.com wrote:

I would argue that that the necessity to add a rich collection of binding operators would be a failure mode.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/webidl-bindings/issues/42?email_source=notifications&email_token=AAQAXUFZ6S3WBWJWIKOAZPDP4OHZHA5CNFSM4H3H73B2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYUAYYY#issuecomment-505941091, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQAXUFNTC67OX5LDBMA3WTP4OHZHANCNFSM4H3H73BQ .

-- Francis McCabe SWE

jgravelle-google commented 5 years ago

"Not negotiable" is an unhelpfully strong way to put that. In my opinion anything is negotiable. So long as we can achieve the set of desirable properties, the approach is only relevant inasmuch as it helps attain that. At the moment I believe that a declarative approach is the most reasonable way to proceed, because I believe declarative style gives room for embedders to optimize host-provided APIs, while still leaving flexibility for language implementations to translate to the binding layer. That's doable with several designs, but a declarative design has an advantage in composability, where pushing conversions into a declarative layer allows embeddings the possibility of eliding them between conveniently-matching languages (e.g., it is more-possible to optimize C-to-C calls if most of the translation has been deferred to the binding layer itself).

As a meta point, I feel we will have more productive disagreements if we can state our opinions with enough specificity to be challenged.

I would argue that that the necessity to add a rich collection of binding operators would be a failure mode.

Please do. I believe allowing many operators is a useful extension point for future bindings that more closely map to languages that model the world differently, while minimizing the work done in userland. If that's fatally flawed, I'd rather be convinced of it now than in five years.

fgmccabe commented 5 years ago

The 'not negotiable' phrase was preceded by the IMO qualifier. I guess it's like saying that I think I am certain ;)

On Wed, Jun 26, 2019 at 11:16 AM Jacob Gravelle notifications@github.com wrote:

"Not negotiable" is an unhelpfully strong way to put that. In my opinion anything is negotiable. So long as we can achieve the set of desirable properties, the approach is only relevant inasmuch as it helps attain that. At the moment I believe that a declarative approach is the most reasonable way to proceed, because I believe declarative style gives room for embedders to optimize host-provided APIs, while still leaving flexibility for language implementations to translate to the binding layer. That's doable with several designs, but a declarative design has an advantage in composability, where pushing conversions into a declarative layer allows embeddings the possibility of eliding them between conveniently-matching languages (e.g., it is more-possible to optimize C-to-C calls if most of the translation has been deferred to the binding layer itself).

As a meta point, I feel we will have more productive disagreements if we can state our opinions with enough specificity to be challenged.

I would argue that that the necessity to add a rich collection of binding operators would be a failure mode.

Please do. I believe allowing many operators is a useful extension point for future bindings that more closely map to languages that model the world differently, while minimizing the work done in userland. If that's fatally flawed, I'd rather be convinced of it now than in five years.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/webidl-bindings/issues/42?email_source=notifications&email_token=AAQAXUEKV5XM2HBVOZ7JXY3P4OXA7A5CNFSM4H3H73B2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYUMJPQ#issuecomment-505988286, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQAXUDCBPLOT5SIJUVZFP3P4OXA7ANCNFSM4H3H73BQ .

-- Francis McCabe SWE

pchickey commented 5 years ago

I would argue that that the necessity to add a rich collection of binding operators would be a failure mode.

I'm sympathetic to the idea that a growing collection of binding operators will create an implementation and maintenance burden. By moving the problem of accommodating "future bindings that more closely map to languages that model the world differently" out of userland, we're taking on a significant amount of complexity in the engines. I'm not sure where the right trade-off is.

I'm currently working on an implementation of a (non-webidl-bindings) binding generator that pushes complexity into userland by having the binding tool create library code used by clang, rustc, and eventually the assemblyscript compiler, which all targets a common ABI. I would love to eventually abandon as much of my tool as possible to use webidl-bindings.

I much prefer the idea of using binding expressions to allow multiple ABIs to bind to the same interface, because, as @jgravelle-google says, languages model the world differently. I don't see any way to come up with a single ABI that will satisfy every language's model of the world, especially given the gradual evolution of the wasm towards the GC proposal, which not every language will take advantage of. So, it seems like the complexity of mapping multiple ABIs to the same interface will need to live in the engine.

As for what the correct set of operators is, or some other approach to perform that mapping, I am a lot less clear.

Is it helpful to split the debate into two pieces - does this belong in the engine or userland, and if the former, how do we design the operators? I am happy to be convinced that there is a reasonable way to solve this problem in userland, rather than in the engine, but I haven't been able to figure out how myself.

fgmccabe commented 5 years ago

@Jacob Gravelle jgravelle@google.com helped me to understand something on the 'set of binding operators' question that I think is relevant here. In particular, while we might end up with a rich collection of operators; the basis of the 'richness' stems not from different languages but from the architectural features of wasm itself. For example, we may have a string->idl operator that reads from linear memory. That operator would not be specifically oriented to C/C++, but to the fact that linear memory is used to represent string values.

This may help to allay fears that we will end up with hundreds of operators; I don't think anyone is foreseeing that level of richness.

On Wed, Jun 26, 2019 at 3:08 PM Pat Hickey notifications@github.com wrote:

I would argue that that the necessity to add a rich collection of binding operators would be a failure mode.

I'm sympathetic to the idea that a growing collection of binding operators will create an implementation and maintenance burden. By moving the problem of accommodating "future bindings that more closely map to languages that model the world differently" out of userland, we're taking on a significant amount of complexity in the engines. I'm not sure where the right trade-off is.

I'm currently working on an implementation of a (non-webidl-bindings) binding generator that pushes complexity into userland by having the binding tool create library code used by clang, rustc, and eventually the assemblyscript compiler, which all targets a common ABI. I would love to eventually abandon as much of my tool as possible to use webidl-bindings.

I much prefer the idea of using binding expressions to allow multiple ABIs to bind to the same interface, because, as @jgravelle-google https://github.com/jgravelle-google says, languages model the world differently. I don't see any way to come up with a single ABI that will satisfy every language's model of the world, especially given the gradual evolution of the wasm towards the GC proposal, which not every language will take advantage of. So, it seems like the complexity of mapping multiple ABIs to the same interface will need to live in the engine.

As for what the correct set of operators is, I am a lot less clear.

Is it helpful to split the debate into two pieces - does this belong in the engine or userland, and if the former, how do we design the operators? I am happy to be convinced that there is a reasonable way to solve this problem in userland, rather than in the engine, but I haven't been able to figure out how myself.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/webidl-bindings/issues/42?email_source=notifications&email_token=AAQAXUCT7Z57MOJUH2UZYLDP4PSE3A5CNFSM4H3H73B2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYU6SMQ#issuecomment-506063154, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQAXUHHVPZLZV3AUHR6QLLP4PSE3ANCNFSM4H3H73BQ .

-- Francis McCabe SWE

rossberg commented 5 years ago

Agreed. In a first approximation that implies that there are going to be only two binding expressions per IDL type: one mapping to memory and one to GC types. And since we don't have GC yet it's even just one per type for now.

I would indeed consider that a healthy property and a sort of litmus test that the mechanism does not deteriorate into a zoo of language specifica. But that will be a very hard line to hold, temptation is gonna be strong. Case in point: the design already features a C-specific binder for zero-terminated strings.

lukewagner commented 5 years ago

Stepping back from the meta discussion, I think @alexcrichton's use case is valid and not remotely language-impl-specific (any linear memory language will have this need). Although this gets into the bikeshedding minutiae, I don't think supporting this use case requires a separate binding operator, just an optional "free function" immediate argument to the existing binding operators.

fitzgen commented 5 years ago

I'd like to discuss the specific technical merits of the proposal in the OP for a moment, rather than general philosophical points.

Consider the case of an exported binding that returns a string whose ownership is transferred to the caller. That is, something like this:

char* my_exported_function() {
    char* s = malloc(n);
    // ...
    return s;
}

To avoid leaking s indefinitely, I currently see two practical approaches:

The caller somehow knows that it is receiving an owned string and cooperates by calling back into the module again after its copy of the string is constructed to give the callee module a chance to free the string. This is what wasm-bindgen does today, since it has full control over what the generated JS wrapper around the wasm export looks like.
my_exported_function inserts s into a to-free list before returning it to the caller, and every time control enters the callee module, it sweeps the to-free list.

Both of these approaches have problems.

For (1), there must be some sort of side channel to communicate the ownership rules, which means that snowman bindings alone is not enough to call exported wasm functions with string types. This would lead to additional idioms which in the best case become de facto standard and in the worst case split the ecosystem and make it so that you have to choose which part of the ecosystem you can interoperate with.

The (2) approach implies additional code size and complexity in the module to maintain the to-free list, and also raises questions like "what if someone passes a string in and the allocator OOMs because the to-free list hasn't been swept yet?" Fixing that requires allocation to inspect the to-free list, which isn't that large of an ask, but it does mean that off-the-shelf allocators like dlmalloc and whatever your libc happens to ship aren't going to work here.

I think the best way to solve this problem is with some sort of binding operator variant that calls a specified deallocation function after the string copy is constructed. This is essentially standardizing option (1) into snowman bindings in a way that hides the ownership worries from callers.

jgravelle-google commented 5 years ago

As a general mechanism for this sort of thing, I've been thinking it would be useful to have explicit function callbacks in the binding section, such as (bind string (allocator $malloc) ..., and so this use case (1) could be handled as an annotation in that context, such as (after-bind (call $free)) or so.

Incidentally this would also be used to remove the utf8-cstr binding, with something like (utf8-ptrlen (get 0) (call $strlen (get 0)))

Regardless of whether we have a mechanism like that I think we should remove the utf8-cstr binding. I thought it would be an "example of how we might extend the set of bindings", but it looks like it's usually viewed as "overfitting to C", so dropping it is probably cleaner. Separate issue, that.

PoignardAzur commented 5 years ago

The caller somehow knows that it is receiving an owned string and cooperates by calling back into the module again after its copy of the string is constructed to give the callee module a chance to free the string. This is what wasm-bindgen does today, since it has full control over what the generated JS wrapper around the wasm export looks like.

I think this is the only viable approach as far as returning dynamically-sized data goes.

That said, if we want to implement a "return string from wasm then call free()" scheme, there are two possible design paths:

The wasm callee returns its data; the host immediately copies the result in a safe place, then calls the free() function on the original. The caller then has access to the copy.
The wasm callee returns a persistent reference to the data. The caller can hold on to that reference as long as it wants, which means it needs to share some sort of lifetime/RC/GC scheme with the callee so they can agree on when the reference is no longer valid.

I think the second is the better abstraction for what happens semantically, but it adds a lot of complexity.

PoignardAzur commented 5 years ago

To make sure everyone's on the same page, here's my current understanding of the planned implementation of outgoing slice returns.

First off, we want a scheme that is as simple as possible. Snowman bindings aren't trying to solve general interoperability problems yet, so we don't need a complex lifetime scheme.

The use cases are mostly module-to-host interop (DOM access, WASI, etc), where we mostly need to return flat data structures, eg arrays and strings, not inter-module interop that may require complex object graphs.

Example

// someProducer.wasm

std::string getString(Data someData) {
  return ...;
}

// someConsumer.wasm

#import someProducer

void foobar() {
  std::string myString = wasm_wrapper.getString(...);     // How to compile that?
}

In the above example, wasm_wrapper.getString is a trusted, host-controlled function: either JS glue code or an auto-generated host function. In any case, allocating and copying the string should occur entirely inside of wasm_wrapper.getString, so that the host can control the inter-module memory access.

Allocation sequence

In the default case, wasm_wrapper.getString does the following:

call someProducer.getString,
getString returns some form of string reference/slice to someProducer's linear memory,
get the size of the slice, then call someConsumer.malloc,
memcpy bytes from the slice to the address in someConsumer's linear memory returned by the malloc call,
call someProducer.free on the slice

The security implications here are non-trivial; while wasm_wrapper.getString is trusted and is the only part directly accessing linear memories, it calls two untrusted functions, someConsumer.malloc and someProducer.free. The malloc call in particular is dangerous, because it might invalidate some of the invariants expected by someProducer:

Call wasm_wrapper.getString again, which might be a problem if someProducer.getString isn't reentrant,
Call someProducer.free early, if the function is a public export.
Throw an exception or otherwise exit, in a way that leads someProducer to stay in an invalid state producing undefined behavior.

Overall, as long as someProducer makes sure that it is in a valid reentrant state between the calls to getString and free, and the host can guarantee that someConsumer can't import someProducer.free, then someConsumer can't break memory-safety and access secret data, even if its code is malicious.

Note that, while the above scheme requires automatic deallocation in the generated bindings, it doesn't require any first class RAII types. Because the generated code takes care of storing, memcpying and freeing the slice, there's no possible way malicious code can store it past its lifetime.

Optimization

The example code seems hard for a compiler to optimize.

Given a function webAPI.consumeString, what we'd want to do is replace the sequence

someConsumer.malloc,
memcpy,
someProducer.free,
webAPI.consumeString,
someConsumer.free

with

webAPI.consumeString,
someProducer.free

However, the host may not always have enough information to deduce when one can be substituted to the other. From the host's point of view, malloc and free are opaque functions that produce side effects by rearranging linear memory in ways that will affect later computations.

To offset this, the compiler could output special annotations giving malloc and free a special status when compiled to wasm; or the host could just assume that any functions named malloc and free have this special status. The host could then assume that a sequence of code taking the pair of pointers returned by wasm_wrapper.getString, passing it to webAPI.consumeString and then freeing it could be elided.

Note that these optimizations would be fairly brittle; for instance, in the following case:

void foobar() {
  std::string myString = wasm_wrapper.getString(...);
  webAPI.consumeString(myString != "" ? myString : std::string("default"));
}

the host would almost certainly fail to optimize the allocations away, even though it in theory has enough information to know that no allocation is required.

I don't think this is a point against having optimizations or using a "malloc then memcpy" scheme; but I think it should be clear when analyzing performance that we shouldn't rely on the optimizations too much.

In the long term, the best way to have robust optimizations in cases like the above is probably to have C/C++/Rust/etc compilers recognize these cases, where data is trampolined between external modules, and directly use GC references to represent that data, which would be easier for the hosts to track. However, this would require LLVM to be able to understand opaque references, which I'm told is really not the case right now.

Either way, the host must make sure that any contraction maintains invariants:

someProducer.free is called exactly once,
the value passed to someProducer.free is the value returned by someProducer.getString,
neither someConsumer nor webAPI.consumeString retain any access to someProducer's memory once someProducer.free is called.

Other optimization cases, like the case where webAPI.consumeString calls webAPI2.consumeString face similar challenges (elide allocation and copy, have a way to recognize when side-effects to linear memory are skippable).

Other limitations

None of this is very helpful for returning arrays of references.
It doesn't really cover arrays of strings or arrays of arrays.
@fgmccabe has expressed concerns about making modules export an allocator that can be called by the host at any point in a way the module can't really control. I don't want to put words in his mouth, and I don't know if he's posted a complete rationale against it, but it's probably something to keep in mind.

WebAssembly / interface-types

Add owned version of outgoing "copy" and "utf8-*" bindings #42