WebAssembly / design

WebAssembly Design Documents
http://webassembly.org
Apache License 2.0
11.42k stars 694 forks source link

In memory representation of complex types/universal data format #1272

Open olanod opened 5 years ago

olanod commented 5 years ago

Maybe is too much to ask or this may go in a different place but wouldn't it be nice if wasm supported natively complex object structures via a universal data format?

The linear memory model plays well with 0-copy serialization formats like flatbuffers or cap'nProto, imagine there is a standardized faltbuffers-like data format that is used all over the place like JSON nowadays, it would come with its tooling for every language like the mentioned formats but everything in a web standards way, to send data over the wire you just specify the offset+length of your data structure and off it goes without any serialization step. So far this looks very achievable with the current tools out there but it could be integrated more deeply into the platform and compilers. Say I have some struct defined in language X, compiler of that language could generate a wasm file with a header that defines the types used and any static structure in the stack serialized in that standard way, language B can instantiate those structures matching it to it's own types and syntax or generate code based on those type headers.
Does something like that make sense?

binji commented 5 years ago

Yes, this does make sense. We've discussed something like this before, I think @aardappel and @jgravelle-google may have more thoughts here.

rossberg commented 5 years ago

Can you elaborate what you mean by Wasm supporting this "natively"? AFAICS, you can define an implement such a thing without any built-in support from Wasm itself.

(As an aside, IME, "zero-copy" is often a red herring. For most languages, this format won't match their internal data layout. If they want to send/receive some of their data structures, they still have to perform an internal de/serialisation step to construct the required format in-memory.)

aardappel commented 5 years ago

@olanod Author of FlatBuffers here :) WASM is already set up to support formats like FlatBuffers at maximum efficiency, no need for special case support. If anything, what you want to ensure is that transfers between WASM linear memory and other languages / the browser can happen zero copy when possible, which is something people working on "host bindings" are looking at. Also relevant: wasm-bindgen.

@rossberg FlatBuffers was designed exactly for that purpose, to both be a forwards/backwards compatible wire format and internal program data without any translation. No de-serialization required, can work directly with mmap or in network stack buffer etc.

rossberg commented 5 years ago

@aardappel, I know, but I'm not buying that it's universal ;). Zero-copy only holds under one of two assumptions: either I am using a low-level language, or I am using a high-level language but not its native data types and structures. Neither makes the format "universal" across languages; that property is mutually exclusive with zero-copy.

jgravelle-google commented 5 years ago

@olanod this makes tons of sense yeah. That was one of the designs we considered for the Host Bindings proposal. We decided to focus on WebIDL bindings for a few reasons, including:

  1. We don't know enough about what a non-browser Host would look like yet, but we would like better support for browser APIs sooner rather than later.
  2. Explicitly narrowing the scope leaves a lot of design room on the table for later when we do have more information/users.
  3. Separating the use cases lets us tackle wasm<->wasm language interop independently from wasm<->js interop, so the format doesn't need to be JS-compatible.
  4. Building that sort of 0-copy ABI (and many other synchronization methods) can be done today, via polyfills. What is missing from the wasm spec to let language X and language B compilers from agreeing on a data format? If it's simply standardization and coordination, then the tool-conventions repo should be sufficient to coordinate. If there's special embedder/browser support needed to make it fast, as is the case with WebIDL bindings, then there's spec work to be done.

So, for anyone who wants to start working on this, I would work on an implementation, either in a language toolchain or the JS needed to marshal data between wasm modules, and/or submit a design to the tool-conventions repo.

PoignardAzur commented 5 years ago

See #1274, which is about defining a type system based on the Cap'n'proto IDL.