Closed rossberg closed 4 years ago
ref.is_null <reftype> : [<reftype>] -> [i32]
(with the understanding that the
<reftype>
would later be refined to a<heaptype>
as per the typed (function) references proposal).
Is <heaptype>
a value type? My understanding was that it is not. If that's the case, would the refinement be:
ref.is_null <heaptype> : [(ref *null* <heaptype>)] -> [i32]
A nice property of this would be that it would disallow saying ref.is_null (ref externref)
which is trivially always true.
@eqrion, disallowing ref.is_null
on non-null types is incompatible with (subsumptive) subtyping. That is, you generally want it to be the case that if an instruction accepts a certain type as input, then it accepts all subtypes of that type as input. Otherwise you get into weird situations where making output types of instructions more precise can break programs.
@eqrion, yes, that's the refinement, see the func-type proposal for details.
As @RossTate said, you cannot rule out non-nullable types for the null check, since that would not be consistent with subtyping.
@eqrion, disallowing
ref.is_null
on non-null types is incompatible with (subsumptive) subtyping. That is, you generally want it to be the case that if an instruction accepts a certain type as input, then it accepts all subtypes of that type as input. Otherwise you get into weird situations where making output types of instructions more precise can break programs.
Ah sorry, I should have been more clear.
I agree that you can't disallow the instruction to operate on subtypes, I was trying to say that it's nice to remove a degree of freedom from the text/binary encoding that can't be meaningfully used correctly.
I would prefer to keep the annotation, because I don't agree it's redundant.
One of the motivations for splitting funcref
from externref
is that they might have different representations. If they have different representations they might have different null values as well, which means that code generated for ref.is_null
might be different.
Removing the annotation would mean the engine would have to use the input type to the operation to determine what code to generate, which is a form of overloading and goes against the general precedent that an engine can determine exactly what code to generate for every bytecode from only its opcode and immediates, and not the types of its operands.
@titzer, I wish you could have voiced that opinion a tiny bit earlier. ;)
Moving forward, esp with GC and other extensions to reference types, there will potentially be a couple of dozens of instructions in the same category, with an infinite set of possible types, so fairly heavy annotations throughout. For example, should call_ref
also have a type annotation? Maintaining such a fully annotated design could induce notable overhead in code size and validation time, for rather little benefit -- you'll need to look at the operand stack anyway, to check the type.
FWIW, I'd still hold the line on overloading. I see what you're saying when comparing to that, but at least technically, this is not overloading, because unlike e.g. addition, the observable behaviour does not depend on the type. It's polymorphism, but in a heap type as opposed to a value type. Whether an implementation chooses to type-specialise is a different question. But precedence for that already exists in some other cases, e.g., how to compile select
may depend on what type of operand it has.
Just to capture some of the discussion from today, I think it's not such a big deal in this scenario.
I think we need to be careful about removing type annotations too aggressively in the future. In particular, I think we do need to consider it on a case-by-case basis because a missing type annotation can impact ease of code generation and/or introduce some problems in an interpreter. (and yes, I believe an interpreter tier is increasingly important for big modules).
Just off the top of my head for call_ref
, an interpreter could benefit from a function type annotation to know how many arguments to pop, without having to inspect the value and get to the function's (dynamic) type. In general, the reliance on types only known from the operand stack can manifest in the interpreter as instead requiring dynamic type information from a value operand. (select
was different because the interpreter can cheat with a universal value representation and doesn't need to inspect the true/false values.) So I would use this as a guide going forward: don't unnecessarily introduce a dependency on having dynamic type information attached to values when run on an interpreter.
When an interpreter parses the wasm module, it will represent the code via some internal data structure. Can't that data structure be updated with whatever type information it needs for execution as the interpreter validates the module?
In general there is a very tricky space/time tradeoff here. An interpreter not concerned about space overhead can rewrite the bytecode into an more easily interpretable form (e.g. like wasm3's register machine). The problem is that the space cost is usually more than 2x, which basically defeats the space savings of an interpreter. It also mostly defeats the savings of compile time that an interpreter offers over a JIT, because this is essentially a JIT to a target bytecode. On the other end of the spectrum, an interpreter with no auxiliary data structures at all is trying to minimize space overhead. Such an interpreter interprets the original bytecode without the help of any side datastructures other than the instance. That is prohibitively slow once control flow gets complicated because Wasm doesn't have jumps, but br-to-beginning-of-block and br-to-end-of-block, whose offsets are implicit. What we did in the V8 interpreter, which admittedly is not a production tier, but meant only for debugging, is to add a sidetable so that control flow targets can be looked up by the branch pc, so branches are O(log(number of blocks)). That works well enough and is only a 10-20% space overhead, and doesn't require rewriting the original bytecode at all. It's possible to sort that datastructure and maintain a second pc so that branches can become O(1).
In general I think with only minor consideration going forward we should try to preserve the option for a low-overhead reasonably fast interpreter tier, because some applications just have huge modules (> 50mb) with lots of cold, and even dead, code.
Ah, thanks for the explanation. I have a better sense of your concern and use-case now.
Could the problem be solved by having the interpreter maintain not just the stack's dynamic value but also the stack's static type (or some sufficient abstraction thereof)?
@RossTate At least in the wabt interpreter that would be an unfortunate overhead, since the static type isn't needed anywhere else after you validate the wasm module.
Oh cool. Good to know. In that case, given what already exists in wasm (including unannotated ref.is_null
), then would a useful way to execute on @titzer's concern be to include type annotations on instructions when they affect behavior in a non-size-polymorphic manner? ref.is_null
effectively checks if the value is all zeroes for the size of the value determined by its type. dup
duplicates the top portion of the stack with the size of that portion determined by its type. And so on, if that makes sense?
@RossTate can you explain what you mean by "non-size-polymorphic" a bit more? I didn't quite follow the examples you gave because I don't know exactly what you're assuming about a hypothetical dup
instruction.
Oops, I got my conversations crossed. I should have said drop
: drop
discards the top portion of the stack with the size of that portion determined by its type.
By size polymorphic, I mean the only information you need from the type annotation in order to execute the instruction is the size of type. So for dup
and select
, you don't need to know the specific type of the operands, you just need to know how big the operands are. Similarly, for ref.is_null
, you don't need to know the specific type of the operand, you just need to know how big it is so you know how many zeros to check for.
Is that any clearer?
Yes, thank you. Are there any non-size-polymorphic instructions now or in any proposal? Is it possible to have a non-size-polymorphic instruction without violating our design principle of not having overloaded instructions? I think by definition it is not, in which case this doesn't seem like a useful distinction to make.
Is it possible to have a non-size-polymorphic instruction without violating our design principle of not having overloaded instructions?
There isn't really a definition of "overloaded". You can always recast an "overloaded" instruction as being "polymorphic over some (small) range of types". So I was trying to get a more concrete/actionable meaning for the term, and currently that appears to be "not size-polymorphic".
With that in mind, it's still arguable whether the combination of i32.add
and i64.add
into one instruction, say iadd
, is not overloaded according to that definition: you can determine the appropriate behavior for iadd
solely knowing the size of its arguments. On the other hand, combining i64.add
and f64.add
into one instruction, say add64
, would be overloaded: the arguments are necessarily the same size and yet should be added differently based on other information about their types.
And since wasm already has select
and drop
, we need to exclude at least size-polymorphic instructions from the definition of "overloaded", and interpreters already need to consider at least size, which gets back to this point:
Could the problem be solved by having the interpreter maintain not just the stack's dynamic value but also the stack's static type (or some sufficient abstraction thereof)?
Due to select
and drop
, any interpreter is maintaining at least the size-abstraction of the stack's static type (at least implicitly in however its representing the stack). The question to consider is, how reasonable is it to extend that abstraction?
For example, consider call_ref
, which @titzer brought up. If an interpreter is willing to track an abstraction of the type ref [i32 i32] -> [i64]
like [4 4] -> [8]
, i.e. a function that takes two 4-byte stack slots and returns one 8-byte stack slot, then call_ref
doesn't need any annotation to support the interpreter. If not, then call_ref
needs a (4 byte?) annotation to support the interpreter, and every type-check of call_ref
has twice as many subtyping checks.
Hopefully some of that manages to give some insight into how I'm trying to refine the problem statement and the design guidance it suggests 😅
@titzer:
So I would use this as a guide going forward: don't unnecessarily introduce a dependency on having dynamic type information attached to values when run on an interpreter.
Interesting, that's a fair point. I don't think we have considered that criterion before, but it sounds useful. I'll have to think about how that would affect various instructions.
@RossTate:
By size polymorphic, I mean the only information you need from the type annotation in order to execute the instruction is the size of type. So for dup and select, you don't need to know the specific type of the operands, you just need to know how big the operands are. Similarly, for ref.is_null, you don't need to know the specific type of the operand, you just need to know how big it is so you know how many zeros to check for.
I think this is mixing up several levels of abstraction and makes a number of hidden assumptions. In particular, Wasm has no notion of size or bits for reference types. Nor should we assume that all actual implementations will always represent null refs by zero, or uniformly across all reference types (if we restrict anyref).
Likewise, whether the size of an operand to drop or select is needed is not determined by the language semantics, but by specific implementation details. In an interpreter, for example, which is the case that @titzer is concerned about, it will typically not matter.
You can always recast an "overloaded" instruction as being "polymorphic over some (small) range of types".
If that range is carved out from a larger set that would otherwise be applicable in that place then that's exactly the definition of overloading, a.k.a. ad-hoc polymorphism.
@rossberg is right that the "implementation size" of values generally doesn't matter for an interpreter because it will use a universal value representation for all values in locals and on the operand stack. That's why drop
and select
are no problem for such an interpreter. Note that this universal representation doesn't necessarily imply boxing some values onto the interpreter's implementation heap, and doesn't have to be the same universal representation used elsewhere in the engine. Values can take up as many machine words as necessary in the interpreter's internal data structures. E.g. with reference types in wasm, it's feasible to just represent that as a pair of a 64-bit heap pointer (for refs) and 64-bit raw word (for f32, i32, f64 and i64) in the interpreter so that it neither has to box nor store tag bits in a side datastructure for the GC. Since the interpreter state is inherently thread-local, there isn't any concurrency issue with having non-atomic multi-word updates of those data structures. That trick makes it easy for the GC to find pointers precisely too, because the interpreter stack is basically an statically-typed array of pairs. Again, in a JIT, we know machine representations statically, so we don't need that hackery.
In particular, Wasm has no notion of size or bits for reference types. ... Likewise, whether the size of an operand to drop or select is needed is not determined by the language semantics, but by specific implementation details
I didn't say that interpreters had to agree on their size abstraction. An interpreter's own size abstraction would be based on its own implementation details.
Nor should we assume that all actual implementations will always represent null refs by zero, or uniformly across all reference types (if we restrict anyref).
In this case, ref.is_null
is pretty "overloaded" from an interpreter's perspective.
it's feasible to just represent that as a pair of a 64-bit heap pointer (for refs) and 64-bit raw word (for f32, i32, f64 and i64) in the interpreter
This is clever, but SIMD is in Phase 3 with a 128-bit value type. Will the efficiency of this scale, especially on a 32-bit machine?
I'm concerned that these approaches to interpreting won't scale and will bog down WebAssembly. Consider struct.get n
. It's likely that this will happen a lot in programs using GC. We can add 4(?) bytes to each instruction to include a type annotation (and slow down validation of each instruction) to facilitate interpreting. But computing the offset and size of field n
in $t
is non-trivial, so likely the interpreter will be using some lookup into some data structure made during planning/validation to find those values. That is, the interpreter will need notes from validation in some form. @titzer already mentioned needing notes for control flow as well.
So would another approach to interpreting be that have the validator phase produce some compact "notes" structure. The interpreter then maintains two pointers: the pointer to the current real instruction, and the pointer into the "notes". After interpreting an instruction like i32.add
, the real-instruction pointer would advance to the next instruction, but the notes pointer would be unchanged since there are no notes for i32.add
. On the other hand, interpreting an instruction like struct.get
would read-and-advance two bytes from the notes pointer, with 3 bits indicating the size of the field and 13 bits indicating the offset. The notes for this particular instruction would be more compact than (or at least no larger than) the type annotation would have been. So it's possible that this approach would in fact be more compact and more efficient than including the redundant type annotations.
Side note: Out of curiosity, I counted 56 single-byte instructions that would be unnecessary if wasm used "overloading".
I don't feel strongly about ref.is_null
in particular, but for the GC proposal, I do think that most of the type immediates should be dropped (for a handy overview, see the table at https://docs.google.com/document/d/1DklC3qVuOdLHSXB5UXghM_syCh-4cMinQ50ICiXnK3Q/edit#heading=h.2p8ajda4s9l1). As @RossTate mentions above, instructions like struct.get
will likely occur quite frequently, and take either 2 bytes (without immediate) or 3-6 bytes (depending on how many types there are). For ref.cast
, we can even drop both immediates, thereby getting from 4-10 bytes down to 2. Since I agree with @titzer that huge modules are a concern, I find reducing the encoded size of common instructions by 2x or more a compelling argument.
From this perspective, the debate about ref.is_null
boils down to consistency and setting precedents. I wouldn't want to end up with a situation of "now that ref.is_null has a type immediate, all GC instructions need them too". If our conclusion here ends up being to look at it case by case, and a forward-looking understanding that the GC instructions will be optimized for encoding size, then I have no objections to that. (For the record, I am certainly supportive of the goal to enable efficient interpretation of Wasm modules; I'm finding it a bit hard to estimate how much of a burden it would really be for an interpreter to track the type information it needs.)
I'm concerned that these approaches to interpreting won't scale and will bog down WebAssembly.
I agree. Given that jump targets and with Wasm GC also struct field offsets require additional information that is not present (and in the case of offsets it's platform dependent, so it can't possibly be added), it's pretty much given that a fast interpreter needs a sidetable of some sorts. And if the interpreter already has an additional data structure to store missing immediates, then there is not much additional overhead in having a few more immediates there. So while unnecessary immediates always lead to additional size overhead in the wire format, they probably won't really benefit an interpreter, which needs to store additional information for many instructions anyway.
This was done.
Before we removed anyref, the
ref.is_null
instruction had a canonical type:One piece of the fallout from removing anyref was that this no longer worked. In order to avoid a dependency on the outcome of the wider discussion opened in WebAssembly/function-references#27, I added a type annotation on the instruction, so that it became
(with the understanding that the
<reftype>
would later be refined to a<heaptype>
as per the typed (function) references proposal).However, given that the discussion on WebAssembly/function-references#27 seems to show a common sentiment to avoid redundant type annotations -- especially considering the many more affected instructions added in something like the GC proposal -- it would be unfortunate if
ref.is_null
became an outlier. And having adapted all the tests, I can say that it is quite annoying in practice, too (ref.null is tedious enough already).So I propose removing the annotation and changing the instruction to
such that the a linear validator simply has to check that there is some
<reftype>
on the stack.Thoughts?