Closed rvolosatovs closed 4 months ago
This looks good to me, so I'll wait another couple of days to let folks comment and if nothing new comes up, merge.
In case anyone isn't also following #352, the recent change above prefixes each value definition with a byte length (analogous to function bodies), so that values can be skipped-over during decoding or decoded in parallel, while still eagerly populating the value index space.
One part of this that I would be concerned about this is that this non-trivially increases the complexity of the binary-to-text tooling. Today binary-to-text for core modules and components doesn't require any validation to happen. With this new section, however, it's not possible to translate binary-to-text without a validator because you'll need to keep track of the component type section to print the value section. A kernel of this concern can be seen in https://github.com/bytecodealliance/wasm-tools/pull/1541 where the decoding step has to take type information which isn't easily acquired right now.
Unfortunately I don't know of a great way to simplify this which doesn't involve lots of discriminant bytes everywhere which seems like this was trying to avoid, however.
@alexcrichton Yes, that's a real downside for sure. What's hard to weigh is whether avoiding that problem is worth adding all the discriminants to the encoding of values, which ultimately would mean that we'll need a separate binary encoding for situations where we don't want the size/decode overhead. It's hard to say for sure whether anyone will want to store very large values in a component, but, farther out in the future, maybe list<record { ... }>
-typed value exports end up being nicer than raw text files for representing metadata/config, so... maybe? But yeah, I feel the tension.
One possible alternative, albeit not a pretty one, would be for the text format to reflect the binary format where it'd be something like:
(value $foo (type $the_type_index) "\0\0\01\02")
where the text format just has the raw bytes and doesn't try to pretty-print the shape
@alexcrichton Oh right, I forgot that option! So then the byte-string (using the same string-literal format as data segments) could be considered the "base" AST whereas the (list 1 2 3)
syntax is optional sugar.
Ah yeah I like the idea of having sugar as well! I had forgotten that was an option, and that should make it much easier to hand-write as well.
Orthogonally, though, wouldn't (list 1 2 3)
be invalid and you'd need something like (list (i32 1) (i32 2) (i32 3))
or similar? (basically won't the text format need to know how big the integer is supposed to be?)
Good point! Because yeah, whether we have discriminants in the text format is independent of whether we have discriminants in the binary format. I was thinking earlier only about the decoding direction, where you need type info to render values anyhow; but in the encoding direction, having the discriminants lets you parse without type info, addressing the original concern in #352. It also lets you write invalid test cases more-easily.
So at the moment we only require discriminants where required by the more-fundamental WAT convention that all S-Expression lists start with an atom, hence the discriminant in (list ...)
and (record ...)
, but based on this reasoning, I think it makes sense to add them for all the scalars too, hence (list (i32 1) (i32 2) (i32 3))
, like you're saying. Does that make sense to you too @rvolosatovs?
Good point! Because yeah, whether we have discriminants in the text format is independent of whether we have discriminants in the binary format. I was thinking earlier only about the decoding direction, where you need type info to render values anyhow; but in the encoding direction, having the discriminants lets you parse without type info, addressing the original concern in #352. It also lets you write invalid test cases more-easily.
So at the moment we only require discriminants where required by the more-fundamental WAT convention that all S-Expression lists start with an atom, hence the discriminant in
(list ...)
and(record ...)
, but based on this reasoning, I think it makes sense to add them for all the scalars too, hence(list (i32 1) (i32 2) (i32 3))
, like you're saying. Does that make sense to you too @rvolosatovs?
Sounds good to me! Especially since this feature is considered unstable, perhaps we could revisit this later if we encounter any issues in the future
For reference, in its' current shape binary encoding proposal is implemented at https://github.com/rvolosatovs/wrpc/blob/84960bb739fd2a26c1d534c67954bbace0f2f005/crates/runtime-wasmtime/src/lib.rs#L482-L1090 in terms of Wasmtime types and values.
Updated the PR according to latest feedback gathered on and off-line. PTAL
I think this is ready for the next review round
I used https://github.com/WebAssembly/component-model/blob/145752a9795f326875c347527040d0e2ebe87c35/design/mvp/Binary.md?plain=1#L363-L367, https://github.com/WebAssembly/component-model/blob/145752a9795f326875c347527040d0e2ebe87c35/design/mvp/Binary.md?plain=1#L265 as main points of reference for the initial approach.
Tried to stay as close as possible to core Wasm spec and the canonical ABI spec.
uN
bitmask - this does favor cases where flags with lowest indexes are set, however I feel like this is a reasonable approach to take at least for consistency with all other numerical valuesu32
-s, same argument applies as above. I do feel like for most use cases the discriminant should fit in a single byte0x00
is used to identifyresult::ok
0x01
forresult:error
option::none
is0x00
option::some
is0x01
valtype
is specified once in the top-level(value t:<valtype> v:<val(t)>)
, because, although type inference should be possible in most use cases by looking at the usages of the value index, values may potentially be unused by the component directly (for example, exported). I feel like it's a very little price to pay for simpler implementation and type checking.val
encoding reference implementation is available at https://github.com/rvolosatovs/wrpc/blob/4fd7488caa4cff19079f851cd7d0efda8f613a64/crates/transport/src/lib.rs#L2557-L3223Refs #15