WebAssembly / gc

Branch of the spec repo scoped to discussion of GC integration in WebAssembly
https://webassembly.github.io/gc/
Other
1k stars 72 forks source link

Refactor cast opcodes? #274

Closed rossberg closed 2 years ago

rossberg commented 2 years ago

With the addition arrayref and structref, it would make sense to refactor cast and classification opcodes to something more modular: instead of a long list of individual opcodes, use an immediate, like in br_on <heaptype> (with some suitable restrictions), and similarly ref.is <ht> and ref.as <ht>. The interpreter has long been using something similar internally.

Edit: To clarify, this is solely suggesting a change to the encoding of this group of instructions, not affecting their semantics.

askeksa-google commented 2 years ago

I was actually going to suggest exactly this, as part of a heap type refactoring based on an idea that came up in yesterday's discussion of i31 as a type dimension - specifically to have unions of heap types also be heap types. I ruminated about this idea and took it a bit further, and I think it could lead to a nice regularization of the type hierarchy - and, as you suggest, improved orthogonality between the type hierarchy and the instruction set.

I'll write my thoughts here in this issue, then, instead of opening a new one.

The setup

The set of singular heap types includes a hierarchy of abstract (any, func, eq, data, array, struct) and concrete (defined functions, arrays and structs) pointer types, plus two isolated heap types - null and i31.

A heap type is a union of zero or more singular heap types. As a start, we can restrict such unions to include at most one hierarchy type. We can then generalize them later if needed.

We will have a syntax for expressing a union. There can be constructors for the combinations of null and i31, followed by a hierarchy type. When a type (hierarchy or isolated) is mentioned without a constructor, we'll need some conventions for which of the other components are included by default (like null is with the current shorthands).

Testing instructions

With such a setup, we can replace our plethora of testing instructions (ref.is_*, ref.as_*, br_on_*, at least 24 instructions in total, currently, counting only static versions) by just 5: ref.is, ref.as, ref.as_not, br_on and br_on_not.

This isn't just more orthogonal, it is also clearer and more directly expressive. Currently, the testing type of ref.cast depends on the input type. It allows null iff the input is nullable. With this setup, the test can always express the exact type combination to be tested. Also, with the current instructions, type tests have to be composed using various combinations of tests for null, i31, data and a specific defined type. Some combinations can get quite unwieldy, such as a nullable cast from eqref, as reported in https://github.com/WebAssembly/gc/issues/130#issuecomment-1032842173.

Sharpening

ref.as_not, br_on and br_on_not will perform type sharpening on the negative result (output of ref.as_not, fall-through arm of br_on, and branching arm of br_on_not). This is one of the main benefits of the union type.

It will be most robust for this sharpening to not perform "sideways" reasoning in the type hierarchy, i.e. the rule would be:

Sharpening on the positive result (intersection of the input type and the tested type) is not necessary, since the precision can be baked into the tested type by the producer. It is a simpler type rule to have the output type just be the tested type.

eq

null, i31 and all subtypes of eq are comparable. Thus, the input type to ref.eq is null | i31 | eq.

With i31 removed from the hierarchy, eq and data become equivalent. We may want to keep the distinction around in case we later want to add some comparable pointer type which does not have the embedded runtime type expressed by data (or func).

Definite null

This setup also fixes another pain point - the lack of a definite null type. We can drop the type immediate from ref.null and type its output as a definite null.

Empty

Sharpening can produce an empty union, so this needs to be part of the type formalism. Whether or not we want to have a syntax for expressing such an empty type explicitly is an open question.

Interpreter performance

A compiler (even a non-optimizing one) can readily generate a specialized check based on the combination of tested type and input type. This is more difficult for an interpreter, which may need to produce side table information during validation to avoid some redundant checks. Adding union types with i31 will require it to do so anyway, though, so this might not be a significant additional complication. It'd be interesting to hear from @titzer and other interpreter authors regarding this.

RTT

It would seem that the distinction in the instruction set originated from the distinction between testing with an RTT and testing non-RTT types (including abstract types). Even though this proposal is formulated as RTT-free, it is still compatible with adding RTT versions of the instructions, as far as I can tell. The RTT versions would not permit sharpening, but otherwise they should work identically.

tlively commented 2 years ago

Whether or not we add those restricted union constructors to the MVP (which I would be interested in benchmarking for code size and performance wins), consolidating the 5 casting instructions and giving them uniform, predictable sharpening behavior makes sense to me. Adding a null type and a new version of ref.null without a type immediate makes sense to me, too.

manoskouk commented 2 years ago

I think this is a very nice way to consolidate the ideas discussed in this thread, as well as around i31 and a potential null type. One thing we should consider is type-encoding size. I measured that, in a real-world J2CL module of total size 22Mb, the ref null opcode comes up ~126.000 times. If a nullable type is now expressed as (ref (union null <heapType>)) (two more bytes than before), this will increase module size by about 1%. This is a good argument to maintain a one-byte shorthand for ref union null. Similarly, we might want to consider other shorthands, e.g. for the current anyref = (ref (union null i31 any)).

titzer commented 2 years ago

I generally like reducing the br zoo down to a br_on and br_on_not for the heap types.

I didn't digest @askeksa-google's union suggestion fully yet. Is it fully orthogonal to this change? If so, maybe it should have its own issue.

As far as the interpreter consideration, before #241, the Wizard requires no sidetable entries for any GC instructions (except maybe a TODO with the RTT depth, but depth goes away with iso-recursive types). There is sufficient information in the form of heap type indices for the interpreter to find its way into module-local tables that have type information. As long as union types were required to be declared in the type section, I think this would be sufficient.

askeksa-google commented 2 years ago

I don't see it as orthogonal. It's the union types as heap types and the existence of a definite null type which allow us to express all the various type tests as a heap type.

As for the encoding, I imagine we will have four type modifiers, which are each a single prefix byte:

These look similar to the type constructors which have been suggested in earlier discussions. The difference is that rather than being type constructors that create value types from heap types, these are modifiers that turn heap types into other heap types.

With these modifiers, we don't actually need an explicit union syntax until we decide to lift the restriction of at most one hierarchy type.

Bare heap types without a modifier have suitable default nullability and i31ability. There seems to be general sentiment that these defaults should be the same across all types. Symmetry would dictate that the default should be either non-null non-i31 or null i31, but I would argue that compactness should have priority over symmetry here, so the default should be the most commonly used combination, which most likely would be null non-i31.

Unions without a hierarchy component are expressed via an empty heap type. On its own, like other types, it defaults to be nullable and not i31able, which makes it the null type. Having a one-byte representation of the null type is practical, as null checks are very common. With modifiers, it becomes never, i31 or null i31.

By the way, what is the reason that deftypes (positive-valued heap types) are not valid value types on their own, without a ref type constructor? The positive encoding space of value types does not seem to be used. It could save significant coding space (and possibly decoding overhead) if they were. This would also completely unify the encoding of ref types and heap types, making heap types just a subset of the value types.

rossberg commented 2 years ago

It seems like there is general agreement to refactor the opcodes along these lines. I'll try to come up with a coherent design. The main question will be when to land this, since it obviously is a breaking change that we might want to synchronise.

@askeksa-google, I haven't fully digested your suggestion yet either. There are some good ideas that resonate with me, but also a couple of high-level points where we need to be careful:

Generally, questions about introducing unions open a big can of worms. I think it works better if we keep this issue focussed on the opcode refactoring and discuss more far-reaching changes separately.

By the way, what is the reason that deftypes (positive-valued heap types) are not valid value types on their own, without a ref type constructor?

Those are very different types. That difference matters in the context of nested compound types (see post-MVP doc): a field that is a struct is quite different from a field that is a reference to a struct.

The positive encoding space of value types does not seem to be used.

There is at least one place where type constructor and type index space overlay already, which is block types: they're either a negative value type or a positive type index denoting a function type signature. So overlaying both in a more global way wouldn't be backwards-compatible.

It could save significant coding space (and possibly decoding overhead) if they were. This would also completely unify the encoding of ref types and heap types, making heap types just a subset of the value types.

I'd actually like to allow type definitions "inline" as a shorthand for one-off uses (and ultimately, merge most type sorts). But it is more involved than perhaps obvious at first. For example, it could complicate validation rules if there were "anonymous" compound types. It might also screw up phasing in engines, e.g., wrt type normalisation. Probably not worth investing in it at the moment, since it can always be added later.

askeksa-google commented 2 years ago
  • Unions will need to be deftypes, not heaptypes. Otherwise you can't name and share them between multiple uses. That would create a lot of duplication and size overhead in use cases where unions regularly have more than one component. Exports wouldn't work either.

Good point. I think this requirement will work fine in practice for i31 unions, as these will typically be united with a few, specific types, so declaring those combinations explicitly will not be a burden.

  • We need to be careful not to make casts too general. Wasm is supposed to be an assembly language, and it is one of its design goals that, wherever possible, every instruction maps closely to an actual machine instruction. Now, with GC we obviously need to raise the abstraction level to a certain extent (to maintain safety and portability). But it would violate the spirit of Wasm if we introduced instructions that compile into arbitrarily long sequences of machine code – as would ultimately be the case if you allowed unions as the target of casts.

I very much agree. While at first it looks very convenient to be able to express checks for arbitrary unions using single instructions, such combined tests are probably rare enough that it would be acceptable to emit multiple tests in those cases.

The fact that we allow the union with null in cast targets is already borderline in my view, but it can be argued as long as we view null as special. But I'd rather not stretch that further.

Yes. While these are again convenient because it's a common case, it's an exception to the rule that the output type of a cast is the target type of the cast. And it's an inconsistency between ref.cast (which threads null implicitly) and br_on_cast (which requires non-null). One option which would make it somewhat more consistent would be to encode explicitly whether the cast allows null or not, instead of letting it be implicit. This would then be an exception to not allowing unions in tests.

I did an experiment to check how prevalent such nullable-to-nullable casts actually are.

In the barista3 benchmark for dart2wasm, as it is, about 32% of all casts have that form. This is when the code generator only emits a non-nullable cast if it is actually required for the output to be non-nullable for the code to validate.

If I change the cast logic to use a non-nullable cast whenever the value would be tested for null anyway (for instance, right before a field access), then the abundance of nullable-to-nullable casts drops to 24%. If I further change it such that it uses a non-nullable cast whenever it knows that the value is in fact non-null, the abundance drops to 11%.

I imagine that this number would be somewhat higher in a language without declared nullability, such as Java. Do we have numbers on how prevalent casts with implicit null pass-through are in J2Wasm?

Through the same changes, the abundance of nullable-to-non-nullable casts rose from 11% to 32%. Those currently require a separate ref.as_non_null before the cast. If casts were always non-nullable (or could explicitly be specified to be nullable or non-nullable), those cases would become more compact.

  • A related point is that, in general, Wasm instructions are designed such that they operate uniformly over the types they allow. When an operation's semantics and implementation is significantly different for different types, then we purposely introduce separate instructions. (The "no overloading" mantra.)

This mantra seems to be somewhat at odds with the whole idea of consolidating these instructions. With this mantra in mind, we should at least make sure that the consolidated instructions behave identically apart from the tested type.

There are a few differences between the instructions currently, which would change if we want uniformity of the semantics. Some that I can think of are:

  1. ref.cast, ref.test, br_on_cast and br_on_cast_fail require the input to be dataref or funcref, whereas all the other ones take anyref. Uniformity would dictate that they all take anyref.

As I understand it, this decomposition of casts into first checking that the value has an RTT and then checking the RTT was done exactly to make each instruction do less and to make their behavior less dependent on their input types. Does this actually help engines in practice?

  1. Currently, br_on_null and br_on_non_null throw away the null value. If these are to be instances of generic type test branches, they would preserve the null, like the other such branches do.

Interestingly, a common use case for a br_on_null which preserves the null value is null lifting (of which nullable cast is an example). It simply becomes

(block $lift
  (br_on_null $lift)
  ;; Perform operation on non-null value
end)

whereas currently, with br_on_null throwing away the null, such an operation is much more involved. If br_on_null would take this form, the overhead of manual nullable casts (should we choose not to have automatic null pass-through) becomes much smaller.

On the other hand, a common use case for br_on_non_null is null fallback, for implementing a null fallback operator (i.e. ?? in Dart or ?: in Kotlin) or for lazy initialization.

(block $lazy
  (br_on_non_null $lazy)
  ;; The null is thrown away here
  ;; Produce value
end)

Here, the most practical behavior is to drop the null, though the extra overhead of a non-dropping br_on_non_null is much smaller than in the br_on_null case - just a single drop.

Generally, questions about introducing unions open a big can of worms. I think it works better if we keep this issue focussed on the opcode refactoring and discuss more far-reaching changes separately.

OK, let's stick here to the parts of the union discussion that affect the opcode consolidation.

rossberg commented 2 years ago

With this mantra in mind, we should at least make sure that the consolidated instructions behave identically apart from the tested type.

Absolutely. Currently, there are three different flavours: (1) null cast, (2) abstract cast, (3) concrete cast involving RTT. They differ in their semantics and implementation as well as in the details of their typing rules. So we probably do not want to over-unify. We could unify (1) with (2), but that may have drawbacks (also, ref.is_null already pre-exists).

Interestingly, a common use case for a br_on_null which preserves the null value is null lifting (of which nullable cast is an example).

Interesting, I haven't seen this pattern in my compilers. Where does it come up?

I did an experiment to check how prevalent such nullable-to-nullable casts actually are. In the barista3 benchmark for dart2wasm, as it is, about 32% of all casts have that form. This is when the code generator only emits a non-nullable cast if it is actually required for the output to be non-nullable for the code to validate. If I change the cast logic to use a non-nullable cast whenever the value would be tested for null anyway (for instance, right before a field access), then the abundance of nullable-to-nullable casts drops to 24%. If I further change it such that it uses a non-nullable cast whenever it knows that the value is in fact non-null, the abundance drops to 11%. I imagine that this number would be somewhat higher in a language without declared nullability, such as Java. Do we have numbers on how prevalent casts with implicit null pass-through are in J2Wasm?

Good question! Originally, casts did not allow null. I changed that on @titzer's request, who wanted it for Java. He may have numbers.

Through the same changes, the abundance of nullable-to-non-nullable casts rose from 11% to 32%. Those currently require a separate ref.as_non_null before the cast. If casts were always non-nullable (or could explicitly be specified to be nullable or non-nullable), those cases would become more compact.

Indeed, I have suspected something like this.

I'd be happy to revisit the special handling of null. As you say, it's a bit of an irregularity, and the benefit was never properly evaluated.

jakobkummerow commented 2 years ago

Originally, casts did not allow null. I changed that

I believe it was the Kotlin team (I might misremember though) who in the past expressed that both variants are useful. I'd support having both, so that modules can use whichever is most efficient for the respective situation.

As I understand it, this decomposition of casts into first checking that the value has an RTT and then checking the RTT was done exactly to make each instruction do less and to make their behavior less dependent on their input types. Does this actually help engines in practice?

It avoids an i31 check, which any cast from anyref needs to include. Aside from that, V8 doesn't care. (I can't speak for other engines, of course.)

askeksa-google commented 2 years ago

Interestingly, a common use case for a br_on_null which preserves the null value is null lifting (of which nullable cast is an example).

Interesting, I haven't seen this pattern in my compilers. Where does it come up?

It comes up with null-aware operations, such as x?.foo(y) in Dart, which means "evaluate x, then call x.foo(y) if x was non-null, otherwise evaluate to null". Though in typical use, often the null can be optimized away because it is either not used, used directly in a null fallback ?? or used directly in a comparison == where the result with null is known.

It avoids an i31 check, which any cast from anyref needs to include. Aside from that, V8 doesn't care. (I can't speak for other engines, of course.)

To elaborate a bit on this, I see two cases:

  1. The static input type is known not to be i31. Is it feasible here (even for baseline compilers) to omit the i31 check?
  2. The static input type is possibly i31. Here the code currently must have a ref.as_data or ref.as_func. You have mentioned earlier that ref.as_data is implemented as "is struct or is array". How does an implicit i31 check compare with such an explicit check?
jakobkummerow commented 2 years ago

The static input type is known not to be i31. Is it feasible here (even for baseline compilers) to omit the i31 check?

Generally yes. We currently don't have that implemented for i31 checks, but we're already doing it for null-checks, and the principle would be the same. It would be a tiny bit more effort, but I think we could live with that. (There might also be limitations in edge cases, where "the module said this couldn't be i31" might handle control flow merges better than "the simplistic compiler was able to infer that this couldn't be i31", but I'd expect cases where this matters to be rare.)

The static input type is possibly i31. Here the code currently must have a ref.as_data or ref.as_func. You have mentioned earlier that ref.as_data is implemented as "is struct or is array". How does an implicit i31 check compare with such an explicit check?

Good point: if we changed ref.cast $t to consume an anyref and implicitly perform an i31 check, then the code sequence for !is_i31 && is_$t would be faster than today's br_on_non_data + ref.cast $t, which needs to emit !is_31 && is_struct_or_array && is_$t. Whether this matters in practice, of course, depends on how often anyref is used, as opposed to typing generic things as (ref struct) or (ref $java.lang.Object) etc. For module producers that want to make use of i31, avoiding the is_struct_or_array checks might be quite impactful.

In summary, I guess I have to revise my earlier statement: It avoids an i31 check, which any cast from anyref needs to include. Aside from that, V8 doesn't care.

(And thanks for taking the time to make me see this -- in retrospect, I wonder why it wasn't obvious to me before...)

rossberg commented 2 years ago

@jakobkummerow, is the point that is_data actually is more precise (and expensive) than what would be needed to ensure you can extract an RTT from an anyref that is not an i31? E.g., because is_data || is_func would suffice for that and require fewer instructions in V8?

If so, is this only a problem when using the full generality of anyref? For example, with eqref, I would assume that is_data == not is_i31, so if you already checked for i31, then is_data could be a nop.

jakobkummerow commented 2 years ago

@rossberg Yes, that's what I was trying to say:

rossberg commented 2 years ago

@jakobkummerow, got it, thanks for the clarification.

askeksa-google commented 2 years ago

A concrete proposal:

As an initial observation, the null-related instructions are different from the rest in several ways:

So my suggestion would be to keep all the null-related instructions as they are. For all the others, we'll have (modulo naming):

Having a separate nullable cast instruction gets rid of the polymorphism of ref.cast, i.e. the output type of the instructions is solely determined by the heap type immediate. It also makes non-nullable casts of nullable inputs more compact.

This decouples the proposal from both the issue of union types (or i31 as a type dimension) and that of null/empty types.

tlively commented 2 years ago

This sounds reasonable to me. @rossberg, @jakobkummerow, wdyt? I believe this is also orthogonal to what we do with RTTs, which would replace the ht immediate in the above instructions if they were to be used.

jakobkummerow commented 2 years ago

No objections.

Checks/casts for abstract types like ref.is data would grow by one byte compared to ref.is_data, but presumably those are fairly rare compared to checks for concrete, module-defined types, so that probably doesn't matter much. (In fact, maybe this makes data so unused that we could just get rid of it as a follow-up, but that's a separate discussion.)

I wonder whether ref.is and the br_on_* instructions would also need a *_or_null variant, but I'm not in a position to have an opinion on that: I'm generally not swayed by "symmetry" or other aesthetics-related arguments, so I think this would come down to the needs of module producers/processors. I suppose for br_on_* there is at least no urgency because br_on_{non_,}null can simply be used, and we can always add a combined instruction as a minor optimization post-MVP if we have data to indicate that it would make a significant difference.

As a logistical question: do we want to introduce new opcodes in the binary format for these, to allow a gradual transition? (I expect that we'll do an opcode cleanup just before finalizing the GC proposal, so this would be temporary at most until then.)

tlively commented 2 years ago

Using new opcodes and doing an opcode cleanup at the end sounds good. The relevant lessons I learned from doing the same (multiple times!) in the SIMD proposal are 1) to really push to opcode cleanup to the very end, not just when you're 80% sure you're done adding new things and 2) to communicate clearly and early with partners that the opcode cleanup will be a breaking change.

titzer commented 2 years ago

To be clear, if we are adding these as new opcodes, we are removing/disabling the old opcodes?

tlively commented 2 years ago

Yes, in this repo we certainly should remove the old versions. V8 and Binaryen will probably continue supporting the old instructions for some time to avoid unnecessarily breaking builds as soon as the new instructions land.

titzer commented 2 years ago

sgtm, as long as this repo continues to represent the consensus on the complete set of bytecodes and their semantics, then that last opcode cleanup will be fairly painless.

rossberg commented 2 years ago

@askeksa-google, thanks, that's roughly what I had in mind, except that I'd restrict the heaptype to be abstract and keep the instructions involving concrete types and RTTs separate like the null-related ones, since they are very different operationally (and type-wise).

The idea of having a separate ref.as_or_null is interesting and worth considering. But I think we'd then need similar variants for the branching casts, and probably ref.is.

tlively commented 2 years ago

@rossberg, can you be more specific about what you have in mind so we can better compare it to @askeksa-google's proposal?

askeksa-google commented 2 years ago

@askeksa-google, thanks, that's roughly what I had in mind, except that I'd restrict the heaptype to be abstract and keep the instructions involving concrete types and RTTs separate like the null-related ones, since they are very different operationally (and type-wise).

Personally, I find the unification of abstract and concrete tests to be one of the more appealing aspects of the refactoring.

Also, abolishing the data/func requirement for concrete type casts and having explicitly nullable casts both have practical impacts on code size.

What are the operational differences you are are referring to which warrant the concrete types having separate instructions?

The idea of having a separate ref.as_or_null is interesting and worth considering. But I think we'd then need similar variants for the branching casts, and probably ref.is.

Having it just for casts is a pragmatic choice indeed. It's mainly motivated by frequency of use: Nullable casts are definitely useful. I have not encountered a need for nullable variants of the other instructions.

titzer commented 2 years ago

If the abstract versions of casts don't pop an RTT value off the stack, and the concrete ones do, I would prefer they be separate instructions.

askeksa-google commented 2 years ago

My proposal above is just for the statically typed instructions. We can add separate versions for RTTs independently.

rossberg commented 2 years ago

@tlively, as hinted at in the OP, I had in mind s.th more moderate, like (picking up @askeksa-google's idea for two versions of ref.as):

where

baseheaptype ::= func_or_data | func | data | struct | array | i32

Null-related instructions and concrete casts are left alone.

So as mentioned, my intent here is mostly a syntactic simplification, with easier extensibility. The only minor semantic change is that I introduced func_or_data as a new abstract heap type that's the common super type of func and data (probably in need of a better name). That potentially allows for slightly cheaper abstract casts when they're just meant to feed into a consecutive concrete cast (though this would become moot if we decided to divorce func from anyref).

@askeksa-google:

Personally, I find the unification of abstract and concrete tests to be one of the more appealing aspects of the refactoring. Also, abolishing the data/func requirement for concrete type casts and having explicitly nullable casts both have practical impacts on code size. What are the operational differences you are are referring to which warrant the concrete types having separate instructions?

Besides having different operands (which is unrelated to being "statically typed", btw), one only looks at the "sort" of an object, the other at its concrete runtime type. In V8 terms, it could roughly correspond to inspecting the pointer tag and "instance type" tag vs the map contents, though the details of the distinction may differ per implementation.

In order to check an anyref for a concrete type, the engine first has to check that it actually is a type that has an RTT. These are multiple steps, in pretty much any conceivable implementation, and the goal of the instruction design is to represent them as such in order to preserve the spirit of a low-level machine language as close as possible. It is seductive to argue with code size here, but this would be quite premature, and a significant departure from Wasm's design so far, where we have decidedly avoided the introduction of ad-hoc "macro" instructions, unless there is a clear case that engines could generate significantly better code for them (such as the bulk instructions).

The idea of having a separate ref.as_or_null is interesting and worth considering. But I think we'd then need similar variants for the branching casts, and probably ref.is.

Having it just for casts is a pragmatic choice indeed. It's mainly motivated by frequency of use: Nullable casts are definitely useful. I have not encountered a need for nullable variants of the other instructions.

Fair enough, though I'd be interested to hear what practical advantage you see in having two variants as opposed to the current instruction whose typing subsumes both. If used with the respective types, that should be compilable to the same code, I believe. I have sympathy for the extra clarity the separation provides, but is there any other advantage?

askeksa-google commented 2 years ago

@rossberg:

The only minor semantic change is that I introduced func_or_data as a new abstract heap type that's the common super type of func and data (probably in need of a better name). That potentially allows for slightly cheaper abstract casts when they're just meant to feed into a consecutive concrete cast (though this would become moot if we decided to divorce func from anyref).

How does func_or_data fit into our current hierarchy? It seems incompatible with eq, since neither is a subtype of the other, and both are supertypes of data.

This reminds me of another gripe I have with the current cast instructions: whereas the input requirements of most other instructions can be expressed as a single type (implicitly allowing subtypes), the cast instructions permit one of two disjoint types. While the special case for this in the validator is not a particularly complicated one, it does seem like a wart.

What are the operational differences you are are referring to which warrant the concrete types having separate instructions?

Besides having different operands (which is unrelated to being "statically typed", btw), one only looks at the "sort" of an object, the other at its concrete runtime type. In V8 terms, it could roughly correspond to inspecting the pointer tag and "instance type" tag vs the map contents, though the details of the distinction may differ per implementation.

Well, that's what the proposal changes. Both kinds of instruction check for a subtype relationship, and how that check differs between abstract and concrete types is up to the implementation. Boundaries between different kinds of checks need not be exactly at the boundary between the abstract and concrete types.

In order to check an anyref for a concrete type, the engine first has to check that it actually is a type that has an RTT. These are multiple steps, in pretty much any conceivable implementation, and the goal of the instruction design is to represent them as such in order to preserve the spirit of a low-level machine language as close as possible. It is seductive to argue with code size here, but this would be quite premature, and a significant departure from Wasm's design so far, where we have decidedly avoided the introduction of ad-hoc "macro" instructions, unless there is a clear case that engines could generate significantly better code for them (such as the bulk instructions).

I agree that operations should generally be decomposed into basic steps, but only when those steps are unambiguously aligned with the overall operation.

The current split seems to assume a particular implementation strategy, and as we have heard in https://github.com/WebAssembly/gc/issues/274#issuecomment-1060840681 this assumption is at odds with (at least) V8, since as_data is a more expensive operation than the extra operation that it would need to perform when casting directly from anyref.

Having it just for casts is a pragmatic choice indeed. It's mainly motivated by frequency of use: Nullable casts are definitely useful. I have not encountered a need for nullable variants of the other instructions.

Fair enough, though I'd be interested to hear what practical advantage you see in having two variants as opposed to the current instruction whose typing subsumes both. If used with the respective types, that should be compilable to the same code, I believe. I have sympathy for the extra clarity the separation provides, but is there any other advantage?

A practical advantage is that a cast from a nullable to a non-nullable type (which is a frequent occurrence, as reported in https://github.com/WebAssembly/gc/issues/274#issuecomment-1050643224) is only one instruction instead of currently two.

Again, having these as separate operations seems to assume a particular implementation strategy. Depending on how null references are implemented, a cast from a nullable input could be just as efficient as a cast from a non-nullable one.

rossberg commented 2 years ago

@askeksa-google:

How does func_or_data fit into our current hierarchy? It seems incompatible with eq, since neither is a subtype of the other, and both are supertypes of data.

Subtyping is a partial order, and func_or_data simply is neither a sub nor a super type of eq. So there's no incompatibility.

             any
            /   \
  func_or_data   eq
         /   \  /  \
     func    data   i32

You can easily construct analogous hierarchies with struct or function types already (as long as multiple super types are allowed anyway). And certainly with union types.

This reminds me of another gripe I have with the current cast instructions: whereas the input requirements of most other instructions can be expressed as a single type (implicitly allowing subtypes), the cast instructions permit one of two disjoint types. While the special case for this in the validator is not a particularly complicated one, it does seem like a wart.

Well, yes, that's what func_or_data is supposed to replace. ;)

Besides having different operands (which is unrelated to being "statically typed", btw), one only looks at the "sort" of an object, the other at its concrete runtime type. In V8 terms, it could roughly correspond to inspecting the pointer tag and "instance type" tag vs the map contents, though the details of the distinction may differ per implementation.

Well, that's what the proposal changes. Both kinds of instruction check for a subtype relationship, and how that check differs between abstract and concrete types is up to the implementation. Boundaries between different kinds of checks need not be exactly at the boundary between the abstract and concrete types.

In fact, the proposal was more like this originally, but long ago changed to what you see now in order to simplify the individual instructions, which seemed overly complex before.

I agree that operations should generally be decomposed into basic steps, but only when those steps are unambiguously aligned with the overall operation.

Yeah, but it's not totally clear what "aligning" would mean in this context. It is a qualitative difference whether a test involves inspecting user-defined type hierarchies or not (and that is regardless of whether RTTs are explicit). In practice, there will always be some difference between engines. But while slight differences may exist, I think it is pretty fair to assume that a close enough distinction will exist in all realistic implementation.

The current split seems to assume a particular implementation strategy, and as we have heard in #274 (comment) this assumption is at odds with (at least) V8, since as_data is a more expensive operation than the extra operation that it would need to perform when casting directly from anyref.

Yes, the introduction of func_or_data should remove that discontinuity as well. As said above, if an abstract cast feeds into a concrete cast, then func_or_data is what you'd use optimally.

Having it just for casts is a pragmatic choice indeed. It's mainly motivated by frequency of use: Nullable casts are definitely useful. I have not encountered a need for nullable variants of the other instructions.

Fair enough, though I'd be interested to hear what practical advantage you see in having two variants as opposed to the current instruction whose typing subsumes both. If used with the respective types, that should be compilable to the same code, I believe. I have sympathy for the extra clarity the separation provides, but is there any other advantage?

A practical advantage is that a cast from a nullable to a non-nullable type (which is a frequent occurrence, as reported in #274 (comment)) is only one instruction instead of currently two. Again, having these as separate operations seems to assume a particular implementation strategy. Depending on how null references are implemented, a cast from a nullable input could be just as efficient as a cast from a non-nullable one.

Okay, I think I can be convinced that having two such instructions is worthwhile. Somewhat unclear whether the same argument wouldn't apply to br_on, though, once that's used in anger. The use cases might be sufficiently different and presumably involve non-null operands or an earlier null check in typical scenarios.

tlively commented 2 years ago

I am strongly concerned about the prospect of making the type hierarchy a DAG rather than a tree and am also concerned about introducing a new basic heap type solely to abstract over func and data and make them usable with the same instructions. Adding a new type (actually two, since your proposal also adds struct) is not an acceptable cost for the benefit of deduplicating a few instructions, IMO.

rossberg commented 2 years ago

@tlively, it would help if you could express your concerns more concretely. As mentioned, almost all interesting type hierarchies naturally form DAGs, unless you artificially restrict them (before the move to explicit subtyping, they in fact were possible everywhere). I don't think we can afford that as a general rule moving forward, just consider union types. And I don't see a general problem with DAGs either. The place where we need to be careful is with user-defined RTT hierarchies.

tlively commented 2 years ago

I suppose a DAG in the basic heap types is not as bad since their relationships are all known statically, but I would be sad to have DAGs in the user-defined type hierarchy because that would make subtype checks more complicated. It seems better to just avoid DAGs entirely, at least until we have a more compelling reason to introduce them.

But regardless of whether DAGs are problematic or not, introducing new types to reduce the number of instructions we have seems like a bad trade.

titzer commented 2 years ago

AFAICT, func_or_data is the pointer type that we had earlier discussed.

rossberg commented 2 years ago

@titzer, yes, exactly.

tlively commented 2 years ago

I added an agenda item for us to discuss this (and hopefully make some decisions!) at the next subgroup meeting: https://github.com/WebAssembly/meetings/pull/1012

tlively commented 2 years ago

We've decided to remove RTTs from the MVP and have decided on the three-pronged type hierarchy since we last discussed what to do with the cast instructions, so hopefully the design space is better constrained now.

The most important property I would like to see in a final set of cast instructions is the ability to cast directly from any to any of its subtypes without having to perform a sequence of multiple casts. This will give the engine the most leeway to optimize casts without having to fuse operations.

Beyond that, I would prefer to reduce the instruction set as much as possible and in particular I like the idea of having a single instruction for each operation (cast, test, br_on, br_on_not) that can be parameterized with any subtype of any. I wouldn't object to drawing a distinction between the built-in types and user-defined types, though, as long as we can still cast directly from any.

@rossberg, did you have an updated set of cast instructions in mind given the other recent developments in the proposal?

rossberg commented 2 years ago

@tlively, yes, I was waiting for approval on #310 before moving on to the next PR. ;)

As mentioned in the OP, the intention of opening this issue was merely a syntactic refactoring of the existing casts, not adding significant functionality. So the design I have in mind still is what I sketched above, modulo the removal of func and func_or_data from baseheaptype.

In one of the past meetings we have already discussed why a monolithic cast instruction would put us in a very bad place. For one, it means that any addition to the static type system would immediately affect this instruction (i.e., high coupling). That would both make it an instruction with potentially unbounded cost and seriously impede our ability to evolve the type system. There are already types we don't want to allow casts for, since they could become costly while of dubious utility (consider eqref or externref, the latter currently moot, but you see the point). Furthermore, a monolithic approach is incompatible with the reintroduction of RTTs (consider casts involving generic types) and, I dare say, the general philosophy of being a low-level language.

Instead, we should seek to factor the casts such that redundant cost can be avoided. With the 3-pronged approach, I believe we do not even need to add much anymore to achieve that (like the func_or_data type before).

FWIW, @askeksa-google suggested earlier to refine the instructions even further wrt their handling of null. I see the reason (custom implementations of null may save an extra check) but I'm still on the fence whether this justifies the extra complexity. Would be curious to hear others' thoughts on that.

jakobkummerow commented 2 years ago

we should seek to factor the casts such that redundant cost can be avoided.

That means making ref.cast (or whatever we call the instruction that casts to concrete types) consume an anyref, to avoid the redundant extra cost of casting to dataref first. I agree that this is an important goal.

jakobkummerow commented 2 years ago

a monolithic cast instruction would put us in a very bad place. For one, it means that any addition to the static type system would immediately affect this instruction

No. If you can arbitrarily define "baseheaptype" to be only a subset of generic heap types, you can apply the same arbitrary restriction to a combined ref.cast <ht> with a rule "ht must be a baseheaptype or a concrete type".

Whether it's prettier to have:

  1. ref.cast_to_base <ht>, ht must be a baseheaptype
  2. ref.cast_to_concrete <ht>, ht must be a concrete type

or just

  1. ref.cast <ht>, ht must be a baseheaptype or a concrete type

is 100% a question of personal taste, and while you are of course entitled to your opinion on your preferred bikeshed color, please don't claim that there are technical/objective reasons why pink would be superior to purple.

a monolithic approach is incompatible with the reintroduction of RTTs

No. RTTs will require a new RTT-consuming ref.cast_to_rtt instruction, which can be added regardless of whether we previously have split ref.cast_to_base <ht> + ref.cast_to_concrete <ht> or a "monolithic" ref.cast <ht>.

FWIW, I don't care whether we combine the cast instructions for concrete types and abstract/"base" types or not. The split has no benefit (aside from letting people express their taste in bikeshed colors), but also no cost (aside from wasting a binary opcode, but we have enough of those so that doesn't matter).

rossberg commented 2 years ago

a monolithic cast instruction would put us in a very bad place. For one, it means that any addition to the static type system would immediately affect this instruction

No. If you can arbitrarily define "baseheaptype" to be only a subset of generic heap types, you can apply the same arbitrary restriction to a combined ref.cast <ht> with a rule "ht must be a baseheaptype or a concrete type".

I was replying to @tlively's request for "the ability to cast directly from any to any of its subtypes without having to perform a sequence of multiple casts".

Whether it's prettier to have:

  1. ref.cast_to_base <ht>, ht must be a baseheaptype
  2. ref.cast_to_concrete <ht>, ht must be a concrete type

or just

  1. ref.cast <ht>, ht must be a baseheaptype or a concrete type

is 100% a question of personal taste, and while you are of course entitled to your opinion on your preferred bikeshed color, please don't claim that there are technical/objective reasons why pink would be superior to purple.

Is "pretty" a relevant category for an assembly language? What matters is orthogonality. Also, I have to say that I take issues with your tone.

a monolithic approach is incompatible with the reintroduction of RTTs

No. RTTs will require a new RTT-consuming ref.cast_to_rtt instruction, which can be added regardless of whether we previously have split ref.cast_to_base <ht> + ref.cast_to_concrete <ht> or a "monolithic" ref.cast <ht>.

The non-RTT instruction will not work for all concrete types. At the same time, the RTT instruction will only apply to concrete types. So there will be a split one way or the other. Better to make it clean and avoid overlapping instructions.

FWIW, I don't care whether we combine the cast instructions for concrete types and abstract/"base" types or not. The split has no benefit (aside from letting people express their taste in bikeshed colors), but also no cost (aside from wasting a binary opcode, but we have enough of those so that doesn't matter).

Great, thanks for saying that.

conrad-watt commented 2 years ago

I think we should have the ability to downcast from (e.g.) any to eq, otherwise there might be practical compositionality issues; these could be separate instructions given @rossberg's point about orthogonality (ref.cast_to_abs?).

rossberg commented 2 years ago

@conrad-watt, do you have an example in mind? What's a scenario where you mix eq and non-eq (i.e., external) types and also want to use reference equality before knowing what sort of thing you have. In most scenarios I have encountered, either you already start with an eqref, or you have anyref but need a case distinction anyway.

Moreover, an engine will probably not implement such a cast significantly more efficiently than the way you can already express it in user space (i.e., test for i31 then data). So it would effectively be a macro instruction for a rare use case.

conrad-watt commented 2 years ago

What's a scenario where you mix eq and non-eq (i.e., external) types and also want to use reference equality before knowing what sort of thing you have.

We've talked before about how abstract types + downcasting will be our stop-gap solution for "generic" functions. If I have a generic module where the common denominator at the module boundary is eq, and a separate generic module over any, I may not be able to compose them or interoperate with both of them at once.

rossberg commented 2 years ago

Right, but a given compiler will of course be consistent in its choice of top type for this purpose. So a mismatch can only arise when mixing modules from different compilers (or foreign libraries), i.e., the seamless interop scenario that we explicitly made a non-goal.

And even then, isn't it sufficient when you can express it in user-space?

askeksa-google commented 2 years ago

There are a bunch of things being discussed here. Some reflections:

Direct casting

Allowing direct casts from any type within the same hierarchy is:

Separation of abstract versus concrete casts

In practical terms, there is no difference between specifying separate sets of instructions for these, or specifying common instructions but different encoding. In both cases, the difference in opcode is redundant given the immediate. The instructions have the same immediate (a heap type) and the same semantics (check whether the operand is a subtype of the immediate and behave accordingly).

Thus it makes very little difference for producers as well as consumers what we do. I would argue for having only one set, since it is:

Target nullability on cast instruction

This is a simple matter of whether a cast from a nullable to a non-nullable type is

ref.cast_nonnull ht or ref.as_non_null ref.cast ht

The first option is smaller and potentially faster.

Cast to arbitrary heap types

Having types that you can't cast to complicates code generation. It essentially means that you can't use these types as input types for any subsystem, since if you ever came across something typed with a supertype that you want to input, there would be no way of casting it, so you have to have extra knowledge about what concrete, castable type the value might have so you can insert a more precise cast.

Specifically, for the case of eq, it means that reference comparison can't internally be generalized beyond values where the actual type is more precisely known. This can be a problem especially in connection with external values.

conrad-watt commented 2 years ago

the seamless interop scenario that we explicitly made a non-goal.

I agree that we've definitely made it a non-goal to support seamless interop of different source-level semantics when composing separately-compiled modules, but it's less clear to me that we've made such a binding decision for separately-produced modules in general (e.g. a Wasm module that is intended to generically implement certain operations on Wasm's array type).

And even then, isn't it sufficient when you can express it in user-space?

Personally I think it would be quite unfortunate if we expect people to actually implement any->eq casts using a big switch statement. I could also imagine such casts being faster in the engine (e.g. a range check on the tag?).

tlively commented 2 years ago

I would be fine with having separate cast opcodes for casts targeting built-in types and casts targeting user-defined types to make future RTT-based casts correspond 1:1 with the MVP casts to user-defined types. @rossberg, that's what you're advocating for, right? With that split combined with the ability to cast directly to any type in a single instruction, we'd end up with

And in the future:

With corresponding variants for ref.as_*, br_on_*, and br_on_not_*.

@askeksa-google's idea about having _nonnull variants of the instructions to roll in the null check is interesting, but I don't know if it would have any performance benefits. @jakobkummerow, do you think it would? If it wouldn't, I would prefer to keep ref.as_non_null as an orthogonal separate instruction to reduce the combinatorial explosion in cast-related instructions.

tlively commented 2 years ago

Oh, I also added a discussion on this to our meeting agenda for Tuesday. It would be great if we could settle this soon!

jakobkummerow commented 2 years ago

do you think it would [have performance benefits]?

One less instruction to decode will be a bit faster, but probably not decisively so.

I assume that potentially bigger savings (of both size and time) might arise in branching situations. If you want to check whether a given value is of type $t or null (e.g. to faithfully simulate a Java cast with catchable exception on failure, so while ref.cast would do the right thing on success, it traps uncatchably on failure), then the current set of ref.test/br_on_cast doesn't allow a particularly concise way to express that. How relevant/important that is is a question for module producers. As far as I recall, it's one of the oldest and most consistent pieces of feedback we've been getting on the GC proposal that sometimes you need "$t or null" and sometimes you need "actual $t, not null", so it'd be useful to have both.

From a pragmatic point of view:

askeksa-google commented 2 years ago

If you want to check whether a given value is of type $t or null (e.g. to faithfully simulate a Java cast with catchable exception on failure, so while ref.cast would do the right thing on success, it traps uncatchably on failure), then the current set of ref.test/br_on_cast doesn't allow a particularly concise way to express that.

This is a good point. It's straightforward to control the nullability of a null-polymorphic ref.cast (just add ref.as_non_null as desired), but for the other instructions it's more involved (whether they are null-polymorphic or not).

Conceptually and textually, we could decide to make the nullability part of the immediate rather than the instruction mnemonic, so we would have e.g. ref.is eq and ref.is null eq. This will tame the number of instruction variants and be a natural parallel to reftypes. In the binary format they would have separate opcodes.

While this is mainly a code size optimization that doesn't add expressibility, and as such could be argued not to be MVP material, it's worth noting that it might not be easy to add later. If we don't have separate nullable versions, we would want at least ref.cast to be null-polymorphic, but if we do, none of the variants should be null-polymorphic.