WebAssembly / component-model

Repository for design and specification of the Component Model
Other
940 stars 79 forks source link

Flat data representation proposal: Enables zero copy shared memory, zero allocation return types, binary serialization #398

Open cpetig opened 2 days ago

cpetig commented 2 days ago

This all started with defining zero copy shared memory over a WIT interface (channel is WIT resource, inspired by iceoryx2):

   let channel = Channel_u32::new("topic");
   loop {
       let message = channel.allocate().await; // WASI 0.3
       message.set(42);
       message.send();
  }

and on the receiver side

  let subscription = Subscription::new("topic");
  loop {
     dbg!(subscription.read().await);
  }

with a WIT definition similar to

   resource object {
       set: func(u32);
       send: static func(object);
   }
   resource channel {
       allocate: func() -> future<object>;
   }
   resource subscription {
      read: func() -> future<u32>;
   }

This is all fine unless you try to place a list<string> inside the shared memory. This put me on a journey which culminated in this discussion issue, … after I figured out a way to express this in WIT (this is inspired by flatbuffers and capn-proto).

Flat marker

Adding a flat<T[, P]> marker, e.g. flat<list<string>, u16> to arguments or results will change the data representation to flat binary encoding: All pointers in list and string become of the second type and are relative to the current position. The same type is used for length encoding. The default pointer type P could be s32.

Passing an argument will follow the normal ownership rules, so imported functions only pass a view while exported functions pass ownership of the buffer. The flat type is represented by a classical (pointer, length) pair. See https://bytecodealliance.zulipchat.com/#narrow/stream/438936-SIG-Embedded/topic/Sept.2017th.202024.20Meeting/near/470965874 for data encoding examples.

Returning a flat data type would change to a caller provided buffer (uninitialized) as the last argument (also (pointer,length)). The call returns the used length (0 indicates error/buffer overflow). This makes the call defined with respect to (partial) ownership transfer.

Similarly to async with WASI 0.3 and future<T> this could become a general option to apply to all functions, making #385 unnecessary, because this is more flexible and more storage efficient.

Buffer objects

Obtaining these buffers from the IPC component requires two new WIT return types: buffer-mut<T> and buffer-view<T> (read-only), both would encode as (pointer, length) and require a drop method to indicate that the buffer/view is no longer in use.

Side benefits

This data representation can also be used as a disk or network encoding of data expressed in WIT (make sure to version your WIT desciption).

API considerations

True zero copy construction of these flat data types require to know in advance the size of a list and pass it to the constructor to linearly place objects in the buffer, relative pointers could be unsigned to simplify the encoding logic.

See the links in https://bytecodealliance.zulipchat.com/#narrow/stream/438936-SIG-Embedded/topic/Sept.2017th.202024.20Meeting/near/470497166 for API examples in Rust and C++.

PS: I initially represented read-only flat types by address only (as the length can be calculated from the data), but this feels counterproductive from a verification and storing perspective.

cpetig commented 2 days ago

Of course the lowering of flat POD types would be identical to normal POD types, I consider (resource) handles as POD here. So the modifier only applies (recursively) to string and list representations.

Update: (Resource) handles don't serialize well across systems, so this needs more thoughts on when to forbid them.