WebAssembly / shared-everything-threads

A draft proposal for spawning threads in WebAssembly
Other
44 stars 1 forks source link

Binary format for `shared` heap types #64

Closed abrown closed 4 weeks ago

abrown commented 5 months ago

The binary encody for shared types is described here. It is relatively clear for most types — i.e., set a bit somewhere — but for heap types we specified that (shared absheaptype) encodes as 0x65 absheaptype. When heap types are used in reference types, we currently use a reftype rule like:

0x64 ht:heaptype => ref ht
0x63 ht:heaptype => ref null ht
ht:absheaptype  => ref null ht

My assumption is that the 0x65 was meant to extend that rule, but we will still need to differentiate between shared reftypes that are nullable or not, so my sense is we would actually need two new encodings:

0x65 ht:heaptype => ref shared ht
0x66 ht:heaptype => ref null shared ht

(Please forgive any incorrect syntax: I'm never sure if I should be using shared or the internal share). I can add the extra 0x66 encoding rule in a PR if that's what was meant.

But there's another way to interpret this: perhaps we should be inserting an extra 0x65 byte everywhere we mark something as shared. In this case, we would write (ref null (shared any) as 0x63 (ref null) 0x65 (shared) 0x6E (any). This "extra byte" approach seems to have the disadvantage that in a Wasm module where all types are shared, we increase the binary size, perhaps significantly (?). It's also a bit inconsistent since in all the other types we were setting a bit.

How we clarify this affects how heap types are used elsewhere. Heap types are encoded after certain instructions (e.g., ref.null); if we add the extra byte, things are clear but again larger in binary size. I was wondering if it makes sense to set the top bit of the absheaptype to indicate shared, saving some space, or does this interfere with the concrete type sN representation?

abrown commented 5 months ago

cc: @rossberg, who @tlively mentioned had made some comments about this originally.

rossberg commented 5 months ago

The shared property is part of heap types, not reference types. This e.g. shows up by the fact that we need to be able to say ref.null (shared any). Moreover, it only applies to abstract heap types, such that e.g., ref.null (shared $t) does not make sense, since the definition of $t itself already defines shared vs unshared.

So instead of the above, we need a production

heaptype ::= ... | 0x65 ht:absheaptype ⇒ (shared ht)

A sketch for this already occurs in the explainer, but without clarifying what syntactic class it is in.

tlively commented 5 months ago

I clarified the current intended encoding in #69. That still leaves the question of whether there is a better encoding. The space of type encodings doesn't really have any global structure except that all the encodings have to be negative, so a bit packing scheme doesn't seem ideal.

tlively commented 4 weeks ago

I left this open to allow for discussion of better encodings, but let's discuss that in a separate issue if anyone has ideas.