ghost commented 6 years ago

Hey,

From reading about the WebAssembly.Memory there really is no way to pass an ArrayBuffer zero-copy to C++ from JS.

Example:

Let's say a WebSocket in the browser triggers the onmessage with an ArrayBuffer of the message data. Now I want to read this data in C++. From all solutions I've seen you really need to:

allocate a new WebAssembly.Memory (or whatever allocation strategy you use)
copy the contents of ArrayBuffer with the message to the ArrayBuffer of the WebAssembly.Memory
call into C++ and use the copied memory as char *
free the WebAssembly.Memory

Imagine having a constructor like new WebAssembly.Memory(ArrayBuffer) so that you could skip all steps except for step 3.

Did I get something wrong or is this not supported?

lukewagner commented 6 years ago

You're right that many Web APIs now incur extra copying overhead that we should work to remove.

The problem with what you've proposed is that WebAssembly engines often want to have special representations (extra guard pages, page-aligned allocation) for Memory which means you can't take an arbitrary ArrayBuffer and, without making a copy, convert it into a Memory.

I think a better way to achieve zero-copy is if, instead of allocating new ArrayBuffers as return values, Web APIs provided functions that took typed array views (or, to avoid view garbage, an ArrayBuffer, offset, and length) as input parameters into which the outputs were written.

ghost commented 6 years ago

I think a better way to achieve zero-copy is if, instead of allocating new ArrayBuffers as return values, Web APIs provided functions that took typed array views (or, to avoid view garbage, an ArrayBuffer, offset, and length) as input parameters into which the outputs were written.

Hmm, I'm not even sure this could be done in all cases. You don't know the size of a message up front and this sounds a lot like solving this problem of zero-copying by enforcing a completely new way of dealing with (in this case) networking. It is like killing a fly with a bazooka.

I don't really see this as a solution. For once it would require quite the extensive overhaul of many Web APIs and secondly, like mentioned, I think this is not even possible in all cases.

It seems super strange that something so essential to data management in JS, ArrayBuffer, would be so foreign to WebAssembly.

ghost commented 6 years ago

In the case of WebSockets you could maybe just add a new binaryType called "memory":

Attribute	Type	Description
binaryType	DOMString	A string indicating the type of binary data being transmitted by the connection. This should be either "blob" if DOM Blob objects are being used or "arraybuffer" if ArrayBuffer objects are being used.

This way all data passed to onmessage would be a WebAssembly.Memory.

lukewagner commented 6 years ago

For streaming cases like reading data from a socket, I think the place to do this is part of the Streams API. As Streams become more prevalent, this means more APIs would automatically become wasm-friendly.

The other big interesting case which isn't covered by streams is canvas/audio/video. I think these would/could know the size ahead of time.

Note that these changes would also be an improvement for JS: by allowing the caller to reuse an ArrayBuffer, it would produce far less garbage.

binji commented 6 years ago

I agree with @alexhultman, I think this is something we should fix in WebAssembly rather than requesting various web APIs to provide a view. Another problem with that solution is that if that API is provided, it's likely that the API will have to copy out anyway, since it likely can't trust the lifetime of the data in the view. Providing a mechanism for WebAssembly to access data from an arbitrary ArrayBuffer is a much more powerful and (IMO) useful feature.

It seems as though this is very naturally represented with multiple memories, which we've been planning from the start. This would mean that this WebAssembly.Memory is special, but I think it just means that you have to generate bounds-checked memory accesses, same as if you weren't able to allocate virtual address space for trap-handled memory.

There are definitely are some complications: the memory may no longer be page-sized, you'll likely want to unbind and rebind different memories, you can't use grow_memory, etc. None of these seem like show-stoppers, though.

lukewagner commented 6 years ago

Another problem with that solution is that if that API is provided, it's likely that the API will have to copy out anyway, since it likely can't trust the lifetime of the data in the view.

If the API is currently producing an ArrayBuffer, then there's already an inherent copy going on, so changing the API to take a view to write into is just replacing that copy. E.g., if you look at the Streams issue, the copy is async (b/c the given view is a view of a SAB and thus allows racy writes) and can thus be the direct recipient of the socket read syscall.

It seems as though this is very naturally represented with multiple memories, which we've been planning from the start.

While there may be special cases where only a single ArrayBuffer is to be read, most of the interesting use cases I've seen would process a new ArrayBuffer every {packet, frame, picture, ...} which would require creating a new wasm instance each time. So I don't think being able to import an ArrayBuffer as a memory is a general solution here.

Other problems with using an ArrayBuffer of input data as a wasm Memory include:

multiple memories don't translate naturally into C++/Rust b/c every load/store must statically refer to the right memory; that makes it hard to, e.g., take your bulk image data and just pass a pointer to libjpeg (which just uses the default memory for all loads/stores)
if attempting to use the ArrayBuffer as the only Memory (to avoid multi-Memory issues), there's no simple way to allocate some extra space for malloc, stacks, and other one-off allocations you might need for computation (without copying into a resizable Memory).

However, there is another persistent idea I should've mentioned above that could actually fit in as part of Host Bindings (if go the anyref route) which is to have a first-class slice value type whose values are basically typed array views. With this, load/store ops could be extended (via the flags immediate) to accept dynamic slice values (instead of static memory indices). This still has the downside of multiple memories described above, so I think the enhanced API route is still best, but it's a powerful raw primitive that could exposed to C++ in various ways.

binji commented 6 years ago

If the API is currently producing an ArrayBuffer, then there's already an inherent copy going on

Yes, I suppose so. Though it's possible that the API is providing direct access to its underlying data, where that can't be true with views. But you're right, maybe it's just changing who does the copy.

which would require creating a new wasm instance each time

I was thinking you could rebind the memory in that case, where the compiled code wouldn't bake in the address or size. Rebinding could be a similar operation to grow_memory in that case.

take your bulk image data and just pass a pointer to libjpeg

True, but perhaps it wouldn't be too much work to modify libjpeg to use the memory region pointers here instead. I don't know much about this C++ extension though (can't remember the actual name of it either).

With this, load/store ops could be extended (via the flags immediate) to accept dynamic slice values

I like this idea, but it does seem like it is more complicated than using static memory indices.

lukewagner commented 6 years ago

Though it's possible that the API is providing direct access to its underlying data,

The one example I can think of currently is where we create an ArrayBuffer whose backing store is a COW-mapped file. That case is definitely worth thinking about more since memory mapped file i/o is great. But other than COW mappings, since ArrayBuffers are mutable by JS once returned, I think it's almost always necessary to return a copy.

I was thinking you could rebind the memory in that case

I think 'rebinding' would need to be a new fundamental semantic operation, then? It's certainly implementable, but I think a bit weird spec-wise given that imports/definitions aren't defined to be mutable locations (that's what we have globals for; you could imagine having storing a reference to a slice in a global...).

True, but perhaps it wouldn't be too much work to modify libjpeg to use the memory region pointers here instead.

It's possible, but I think it's a fairly non-trivial change to make to an existing codebase in general. You have to put a special __attribute__ on every pointer statically indicating that pointer points into a non-default addrspace. This is hard b/c these pointers would be mixed with uses of the default linear-memory (stack, malloc, alloca etc) and in some cases a single pointer could point to either dynamically.

I like this idea, but it does seem like it is more complicated than using static memory indices.

Yeah, both more expressive and more complicated. If we're adding anyref and other opaque values (say exception) that require GC stack scanning anyway, the additional implementation complexity for a first-class slice could be modest, though, which is why I bring it up in relation to Host Bindings.

ghost commented 6 years ago

If the solution is a significant overhaul of the entire (JS) Web API then why not cut to the chase and just define a standard C Web API and cut JS completely out of the picture. Then you could reach and control the browser without the need for intermediate JS wrappers just acting as inefficient delegates.

binji commented 6 years ago

since ArrayBuffers are mutable by JS once returned, I think it's almost always necessary to return a copy

Right, unless the ArrayBuffer was detached, to represent transfer. But no Web APIs do that aside from postMessage, I suppose.

I think 'rebinding' would need to be a new fundamental semantic operation, then?

Yeah, I'm not entirely certain how it would work. But if we only allow it for linear memory, and a Memory object is just a pointer to its data and its length, then I think it has similar behavior to growing memory -- the data pointer and length are different. Unbinding would be like detaching the buffer; set data pointer to null and length to 0.

Ah, addrspace, thanks! Yeah, you're probably right that it wouldn't be trivial to modify an existing codebase to support that. But I'm guessing the compiler would help you here, so who knows? And you can probably assume aligned pointers and stuff the addrspace in the low bits. Or to be more careful, assume < 4G of memory and stuff the addrspace in the high bits, so access will trap. Then it looks you can use addrspacecast to turn it back into the correct addrspace pointer.

Yeah, both more expressive and more complicated.

I'm not sure how it solves the C++/rust problem though. You'll still need special objects to access this memory.

just define a standard C Web API and cut JS completely out of the picture

I assume you mean C bindings to existing web APIs? I like this idea, and I believe we've already talked about something like this for APIs like WebGL where there is an underlying C API that is well known. I'm not certain this can work for all Web APIs, though, as they often rely on JS-specific features that would be difficult (or maybe impossible) to provide from wasm.

lukewagner commented 6 years ago

I'm not sure how it solves the C++/rust problem though. You'll still need special objects to access this memory.

Right, that's what I said in my initial comment on slice; that it has the same problem as multiple memories.

If the solution is a significant overhaul of the entire (JS) Web API

I don't think "overhaul" is quite right. We're talking about adding overloads to some existing methods, in a way that can be done incrementally, for the hottest methods first.

binji commented 6 years ago

Right, that's what I said in my initial comment on slice

Sorry, misread that comment.

We're talking about adding overloads to some existing methods, in a way that can be done incrementally, for the hottest methods first

We should definitely do this, and I agree that it shouldn't be too much burden for most APIs. Like you say, they're probably doing a copy anyway.

My concern is mostly for cases where we aren't just giving data back to a web API, for decompressors and decoders and so on. We can just use typed arrays over WebAssembly memory for this too, but it's pretty unsatisfying to have to manage the lifetime of that data all the way through to the wasm module.

lukewagner commented 6 years ago

We can just use typed arrays over WebAssembly memory for this too, but it's pretty unsatisfying to have to manage the lifetime of that data all the way through to the wasm module.

I may not understand your meaning here but, except for cases that detach [1][2], Web APIs that want to use a view's data after the call returns need to make a synchronous copy, so it doesn't seem like lifetime would be an issue here.

binji commented 6 years ago

Here's an example of what I was thinking. Imagine you are using a zlib wasm module library. You don't know what the user of the library is going to do with the decompressed data, so you want to hand back an ArrayBuffer. This requires a copy-out from linear memory. Instead, you could hand back a Uint8Array view into the wasm memory, but then you need a way to manage the lifetime of the memory in the wasm module.

lukewagner commented 6 years ago

Ah hah, I see the direction you're talking about now. Yeah, I totally agree we should try to avoid such opportunities for wasm-level "leaks" and "use after free". So for the case of compression, encryption, and any large data processing, it seems like the best interface would be for the compressor to take a stream in and a stream out. This has parallelism and composability wins. I think we should strive, at both the toolchain and host-bindings-feature level, to make it easy/efficient to work with streams.

briantbutton commented 6 years ago

My vote for this feature, if I understand it correctly.

For general interest: There is a '.set' function for typed arrays which seems perfect to fill this hole. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray/set

jakirkham commented 6 years ago

Sorry I'm a newbie to WebAssembly so am still learning and I may not fully understand the suggestions here. That said, am not sure streams are sufficient for solving the problem. They may make sense in the case of compression, encryption, etc. However things like FFTs generally require the full dataset to operate on or are at least difficult to implement for cases where only chunks of the data are supplied. So a stream won't cut it there. If the data is sufficiently large, copying the data will (if you are lucky) just be slow or (if not) crash the browser. What would you propose in this case?

lukewagner commented 6 years ago

The comments before the streaming comment discuss extending Web APIs to, instead of returning data in new ArrayBuffers, take caller-supplied views into existing ArrayBuffers where the data can be read into.

jakirkham commented 6 years ago

Views sound like a good idea. This is very similar to what people do to solve this problem in Python (e.g. the Python Buffer Protocol). Would it be possible to make read-only views? How would returning arrays work?

lukewagner commented 6 years ago

So far, JS/WebIDL doesn't have read-only typed array views. For "returning arrays", the general idea is to replace an array return value with a mutable view parameter that is written into. The hard part is picking the size of that mutable view argument: I think there's multiple options here, and probably different things would be appropriate for different APIs.

frank-dspeed commented 2 years ago

while I like the thoughts about the stream API I still do not fully understand how the stream API is different from any other. In general, the stream API is only a simple interface like any other object function-based one. There is zero difference.

But what I clearly see and that is even how I was finding the issue here. Is the fact that we need Memory Management with a user Permission API to enable Shared Components based on user agreement that the components are allowed to share data. Even Cross-Origin. Then We Abstract away all isolation issues by user-defined Isolated Space (context) I will try to Start a Proposal for that something like

Proposal User Defined Isolated Spaces (memory) using Permission API

The idea is to decouple the Memory Isolation that is mostly tab/worker/process/origin-based into a Shared Address Space that is able to assign consumers via a User Facing Permission Request even cross Origin.

Security Concepts

To make Cross-Origin Work it would be required that only the creator of the shared space knows a pks(pre-known secret) that needs to get shared with the other origin that wants to use that context.

There should be no way to list existing spaces or interface with them without the pks that only the creating context knows

There are tons of other stuff but I think that gives some input

Update

After Carefull thinking i mixed something up in my brain because of streams maybe my idea is not usefull for this scenario

Macil commented 2 years ago

@frank-dspeed By cross origin, do you mean between different WebAssembly modules running in different iframes or windows? I would expect that kind of integration to work by something like a WebAssembly.Memory/SharedArrayBuffer being postMessage'd between the frames so multiple frames and their WASM modules could get a reference to the same SharedArrayBuffer. I don't think there's a need to reach for needing any pre-shared keys.

frank-dspeed commented 2 years ago

@Macil the need for the pks is to identify the context and make all origins aware of its existence to use it something like the WebRTC Signal Mechanic combined with the Hardware IDS of the WebAudio API that allows the usage of a selected audio device if you know the hardware ID (pks key) as sinkId Value on MediaStream Objects.

For example to define shared memory for cross-origin wasm modules inside iframes it would be enough to message the pks to all frames and then they can work and share data in raw memory as an interface.

update

After Carefull thinking i mixed something up in my brain because of streams maybe my idea is not usefull for this scenario

somethingelseentirely commented 2 months ago

To provide a different viewpoint/use-case on this. We're building a RDF-like graph database that heavily relies on Succinct Data-structures for indexing that are zero-copy mmaped.

Given a Blob holding such an index (actually the entire database/dataset as an immutable read-only file, think Apache Arrow) we would like to pass its ArrayBuffer to our query engine written in Rust and compiled to WASM.

Since browsers are already free to cache/mmap/manage Blobs however they see fit, and since they don't contribute towards any memory quota, this would be the ideal mechanism to quickly crawl and query multiple such datasets.

Streaming them is definitely NOT an option, the added latency quickly makes any meaningful queries (which often transitively load more datasets referenced in previous datasets) infeasible. Providing this as any other structure than the raw bytes is also not feasible since the algorithms that read these structures make heavy use of SIMD and raw memory accesses.

It seems like a huge missed opportunity to get this right, given that mmap will eventually be required for WASI anyways, especially under the consideration that multiple memories with different read/write capabilities, would imply significantly improved security compared to MVP WASM.

The use of intrusive memory management techniques that preclude raw byte ranges for linear memory seems like a costly decision that precludes WASM from covering many real world use-cases.

kawogi commented 5 days ago

I don't see that explicitly mentioned so I'll add a common use case (at least for me) where this would be very helpful. To use multithreading I spawn several WebWorkers which shall communicate with each other. (e.g. work concurrently on the same possibly huge dataset). For that to work I create a SharedMemoryBuffer and pass it around. Each WASM-Module wraps a safe end-point implementation around that buffer (e.g. https://docs.rs/wasm-rs-shared-channel/0.1.0/wasm_rs_shared_channel/).

Iiuc sending a chunk of bytes through such a channel would involve two copies (one on each end) to transmit them. With that shared memory being mapped into the WASM address space (as separate memory entity), this would be zero copy.

Also I suspect that the coordination via atomic primitives involves a lot of overhead right now. But I have to admit that I didn't find out how wasm_bindgen solves this under the hood.

I know that I could in some cases combine all binaries into a single memory and let this run on a shared memory to get the same effect. But the last time I tried this it was a very fragile setup with several downsides (Rust-panics not working, memory wasn't resizable, nightly Rust-compiler required, unable to share memory with external wasm-modules etc.)

WebAssembly / design

Zero-copy pass ArrayBuffer from JS-land to WebAssembly-land #1162

Proposal User Defined Isolated Spaces (memory) using Permission API

Security Concepts

Update

update