Pushing RTTs to post-MVP

WebAssembly / gc

Branch of the spec repo scoped to discussion of GC integration in WebAssembly

https://webassembly.github.io/gc/

Other

996 stars 72 forks source link

Pushing RTTs to post-MVP #275

Closed tlively closed 2 years ago

tlively commented 2 years ago

I would like to propose pushing RTT value types and all instructions that use them to post-MVP. Binaryen and V8 support RTT-less versions of all allocation and casting instructions that use static type index immediates instead of dynamic RTT values and j2wasm and Dart have both found these static versions of the instructions to be sufficient for their needs in the presence of explicit supertype declarations, as used in #243. In particular, Dart has implemented its own userspace type information that it uses to implement language-level casts, but that custom type information does not depend on RTT values.

We have also found that Binaryen can optimize modules that use RTT-less instructions to be about 3% faster than the same modules with RTTs because the reduced dynamism allows Binaryen to prove that more casts are unnecessary. This better optimizability will become increasingly important as engines do more inlining and follow-on optimizations themselves.

As a concrete example, consider an inlined method call that downcasts its this parameter. With RTTs, showing that the cast always succeeds requires proving that the this value has always been allocated with the same RTT used in the cast, whereas an RTT-less cast can be shown to always succeed just by looking at the static type of the this argument and seeing that it is a static subtype of the parameter's type.

If we find that we need to add RTT values back into the language as part of a post-MVP generics proposal or for some other reason, we can easily add RTT-using versions of the allocation and cast instructions back as well. That we have already been supporting both kinds of instruction in Binaryen and V8 demonstrates that the maintenance overhead of that duplication is not a problem.

rossberg commented 2 years ago

I'd advise against that, for a number of reasons:

Without RTTs, which can be imported, there would be no way to customise the host-side appearance of Wasm GC objects (e.g., JS prototypes). We'd become dependent on type imports again, which I believe we'd all like to avoid for now. Furthermore, it's unlikely that the (ab)use of type imports for that purpose can be made to work for that without complicating the type system (AFAICS, it would require introducing a third type relation besides type equivalence and subtyping).
Making RTTs explicit is in line with Wasm's defining character as a low-level "assembly" language. It makes data and operations explicit that have a machine-level cost (typically involves at least an allocation). Furthermore, it allows expressing reuse of the result of that operation (e.g., by assigning it to a local or global), without assuming additional engine optimisations. It also allows distinguishing between types that require RTTs and those that don't. IOW, they create a more accurate, low-level cost model.
RTTs will be needed with generics. If we were to introduce them later, we'd need to duplicate all relevant GC instructions at that point (and presumably future ones as well).

We have also found that Binaryen can optimize modules that use RTT-less instructions to be about 3% faster than the same modules with RTTs because the reduced dynamism allows Binaryen to prove that more casts are unnecessary.

Ah, if that is your main motivation: I am pretty sure that this is no longer the case after #243. Because that change removes rtt.sub and makes subtyping part of the type definition itself, putting static and dynamic types into full sync. That is, there will always be a 1-1 correspondence between an RTT and a static type. RTTs are known to be equivalent precisely when their static types are. (Even though later, with generics, that static type may depend on a type parameter, and both RTTs and static types can be parameterised.)

Does that make sense?

tlively commented 2 years ago

Yes, thanks, your arguments make sense, but I still think removing RTTs is the best call for the MVP.

Without RTTs, which can be imported, there would be no way to customise the host-side appearance of Wasm GC objects (e.g., JS prototypes). We'd become dependent on type imports again...

This is making a lot of assumptions about what the JS API will look like and what capabilities it will have, when my understanding is that we haven't really focused on the JS API design so far. It seems backwards to put as cross-cutting a feature as RTTs into core Wasm in service of the JS spec layered on top of core Wasm, although I'd be happy to discuss this further once we have a better idea of what we want from the JS API.

Making RTTs explicit is in line with Wasm's defining character as a low-level "assembly" language... they create a more accurate, low-level cost model.

Maybe, but we don't have a cost model in any technical sense and I would hope that it's clear enough that struct.new and friends already require allocations, so I don't see this as sufficient reason to keep explicit RTTs.

RTTs will be needed with generics. If we were to introduce them later, we'd need to duplicate all relevant GC instructions at that point (and presumably future ones as well).

Understood, and I think that will be a very reasonable cost to pay for removing complexity from the MVP, given that it is a cost multiple implementations have been paying without issue.

I am pretty sure that this is no longer the case after https://github.com/WebAssembly/gc/pull/243. Because that change removes rtt.sub and makes subtyping part of the type definition itself, putting static and dynamic types into full sync. That is, there will always be a 1-1 correspondence between an RTT and a static type.

Thanks, I had missed that we had already made RTTs and the static types completely equivalent for the MVP. This further convinces me that removing RTTs from the MVP is the best course of action.

conrad-watt commented 2 years ago

This is making a lot of assumptions about what the JS API will look like and what capabilities it will have, when my understanding is that we haven't really focused on the JS API design so far.

FWIW, when this was discussed previously, the suggested design involved attaching JS property information to the RTT.

Due to canonicalisation, it would be somewhat tricky to associate the same information directly with the static type declarations if RTTs were made implicit - maybe object creation would have to be annotated through some custom section in order to to attach JS-specific information to the implicit RTT?

manoskouk commented 2 years ago

RTTs will be needed with generics. If we were to introduce them later, we'd need to duplicate all relevant GC instructions at that point (and presumably future ones as well).

There are only 10 instructions (including experimental ones) that require an rtt input:

the 6 allocation instructions struct.new, struct.new_default, array.new, array.new_default, array.init, array.init_from_data,
and the 4 test instructions ref.test, ref.cast, br_on_cast and br_on_cast_fail.

I do not think it is a big burden to duplicate 10 or a few more instructions later, especially given that the rtt-less instructions are simply syntactic sugar of the explicit-rtt ones, therefore easy to implement. Note that the V8 implementation already provides both alternatives.

Furthermore, it allows expressing reuse of the result of that operation (e.g., by assigning it to a local or global), without assuming additional engine optimisations.

I would agree that local assignment could perform better in absence of certain optimizations, but

these optimizations are already done (at least in V8) as a special case of load elimination, and
it is in my opinion unlikely to be used in practice, as a producer would need to detect multiple allocations in a function that use the same rtt, introduce additional locals for them etc.

When it comes to globals, I do not think reusing them is easier than rtts: Since canonical rtts are global to the module, they need to be loaded from the instance just as globals, therefore the same optimizations are required to eliminate multiple loads thereof. In fact, mutable-global loads are harder to eliminate than (immutable) rtt.canon loads.

kripken commented 2 years ago

@rossberg

https://github.com/WebAssembly/gc/pull/243. Because that change removes rtt.sub and makes subtyping part of the type definition itself, putting static and dynamic types into full sync. That is, there will always be a 1-1 correspondence between an RTT and a static type. RTTs are known to be equivalent precisely when their static types are.

I am not sure I follow this. To check my understanding, for the relevant optimization here, it is not enough to compare RTTs - we also need to know which RTT an object has based on its type, statically. Here is a concrete example:

(func $method.of.X (param $this (ref $Object))
  (ref.cast $X ;; $X is some subtype of the generic $Object
    (local.get $this)

;; =>
;; Inlining or some other VM or toolchain inference leads us to see that the
;; static type of $this is in fact $X, and not the more generic $Object.
;; =>

(func $method.of.X (param $this (ref $X)) ;; the type here changed
  (ref.cast $X
    (local.get $this)

;; =>
;; The relevant optimization: A ref.cast of something of type $X to $X is a no-op.
;; =>

(func $method.of.X (param $this (ref $X))
  (local.get $this) ;; the cast has been removed

Does #243 give us what we need for the last step here?

tlively commented 2 years ago

@kripken Yes, because after #243 there is exactly one possible RTT value for each type, namely the value of rtt.canon, which is the same every time it is executed for a particular type.

kripken commented 2 years ago

@tlively

I see, thanks. But do we expect that to stay the same in the future? If we extend wasm later in a way that makes it possible to construct an item with an RTTs other than rtt.canon of that item's type then we'd be losing the possibility to optimize in the way I sketched.

tlively commented 2 years ago

Yes, that's right. And to prevent optimization regressions for modules that don't need the post-MVP extensions that use RTTs, we would want to add the RTT-free instructions at that point anyway.

tlively commented 2 years ago

I added this to the agenda for our next meeting so we can discuss it in real time if necessary.

takikawa commented 2 years ago

I hadn't seen this thread so I missed chiming in earlier, but I agree with Andreas and Conrad that it seems like this will be quite limiting for potential JS API designs.

FWIW, when this was discussed previously, the suggested design involved attaching JS property information to the RTT.

Due to canonicalisation, it would be somewhat tricky to associate the same information directly with the static type declarations if RTTs were made implicit - maybe object creation would have to be annotated through some custom section in order to to attach JS-specific information to the implicit RTT?

I agree that it would be tricky to do this kind of approach with a custom section. This has been brought up before in discussions, and there was also some concerns raised that this kind of use of a custom section would be new for Wasm, in that the behavior isn't really optional (e.g., for tooling or optimization) but would be required to be interpreted for the JS part of the program to work.

Another thing I wonder about is how casting at boundaries between Wasm and host programs will work without instructions like rtt.canon. Right now, especially with the removal of rtt.sub, there is a straightforward way to handle ToWebAssemblyValue(v, (ref $t)) where $t is a struct type and v a host value. It could just evaluate the cast (ref.cast v (rtt.canon $t)). Specifying this without RTT instructions seems trickier, unless the plan is that RTTs would still exist in the semantics somehow but not be exposed as values.

kripken commented 2 years ago

@takikawa If a custom section has issues, could this be done on the JS API side? Without RTTs we basically need the JS side to declare a mapping of wasm type index to JS info, which there are a few ways to bikeshed. Another benefit of using the JS API is that this is really a JS-specific issue, so leaving it out of the wasm file and core wasm spec seems to make sense.

The only maybe odd aspect that I can think of to such a mapping is handling of wasm canonicalization (which is I guess the issue you refer to before @conrad-watt ?). Say if type indexes i and j have different JS info associated with them, but the wasm ends up canonicalizing them, then we'd need to error. But that seems reasonable and unavoidable? If the JS wants to actually differentiate those two types then that is an ABI contract between wasm and JS, and the wasm should have reflected that (by putting the types in the same rec group where they will not be canonicalized).

lukewagner commented 2 years ago

If the goal of the GC MVP's JS API is to allow JS glue code to present wasm GC objects to JS code with nice idiomatic property names and methods on the prototype chain without an extra indirection through a JS wrapper object, I don't see a way to achieve this if the core wasm GC MVP doesn't provide the ability to supply something generative (i.e., not globally canonicalized) directly to struct.new. This generativity (of types or rtts, either way could work, iiuc) gives the JS API an address in the store with which to associate a fresh JS prototype object with -- just like how (for better or worse) WebAssembly.Function objects are associated with funcaddrs in the store.

You could of course imagine that each unified structural type is given an address in the store and so each Realm ends up associating one global prototype for each such unified structural type. However, I think this would, in the limit, lead to unrelated libraries accidentally clobbering each others' prototype chains when they happened to use the same structural type. (In particular, to avoid this problem, I've been assuming that wasm GC objects created with empty or canonical rtts would have a null prototype.)

The "directly to struct.new" part is critical, I believe, since this is the point where the shape/map/hidden-class is stored into the GC object header, thereby associating the particular prototype object with the particular allocation. Any later point (say, ToJSValue(), when a wasm GC object flows out to JS) is too late (which is why I don't see a pure-JS-API solution working -- I suppose you could call out to JS in place of struct.new, but I assume this is too slow).

The three possible sources of generativity I'm aware of are: type imports (allowing the JS API to generate fresh types imported by the module), rtts (allowing either wasm or the JS API to generate fresh rtts), private type definitions (but I believe the expected impl requires a layer of indirection so this wouldn't satisfy the base perf requirements). Thus, if we punt both rtts and type imports out of the MVP, I'm not aware of a viable solution that wouldn't cause problems down the line (i.e., not be a hack we'd regret later). Although of course I might be missing something. (Personally, I'm rooting for the (possibly stripped-down) type imports direction.)

tlively commented 2 years ago

If the goal of the GC MVP's JS API is to allow JS glue code to present wasm GC objects to JS code with nice idiomatic property names and methods on the prototype chain without an extra indirection through a JS wrapper object...

Whether we want this and how much complexity we are willing to add to the core Wasm part of the MVP to get it is definitely something we should discuss more.

My personal take is that this kind of rich interop would be a nice-to-have, but it's not worth complicating the core spec to get it. There are also potential benefits to not having such rich interop. Namely, if the JS API encourages the use of exported and imported getters and setters of more primitive data (e.g. raw numbers, opaque references, and arrays thereof) over the direct exposure of complex internal data types, that would give Binaryen more room to prove that certain fields are unused or always constant and remove them. If the JS API instead encourages exposing complex internal data types directly, Binaryen will need to be told how they will be used via some other more complicated means to perform those optimizations.

kripken commented 2 years ago

@lukewagner

Any later point (say, ToJSValue(), when a wasm GC object flows out to JS) is too late (which is why I don't see a pure-JS-API solution working -- I suppose you could call out to JS in place of struct.new, but I assume this is too slow).

I think I meant something equivalent to the solution you describe, in which at struct.new time the important JS info is known and used. The only difference is that I envisioned a JS API that sets that info up in advance for all struct.new operations in the module, rather than calling out to JS at each struct.new (which I agree would be risky in terms of speed). So the only difference is the time at which the JS info arrives. Do you think that can work? (I do not fully follow your points about "generativity", so apologies if I've missed something.)

tlively commented 2 years ago

@takikawa,

Another thing I wonder about is how casting at boundaries between Wasm and host programs will work without instructions like rtt.canon. Right now, especially with the removal of rtt.sub, there is a straightforward way to handle ToWebAssemblyValue(v, (ref $t)) where $t is a struct type and v a host value. It could just evaluate the cast (ref.cast v (rtt.canon $t)). Specifying this without RTT instructions seems trickier, unless the plan is that RTTs would still exist in the semantics somehow but not be exposed as values.

I don't think this should be a problem. In a world where we remove RTTs from the MVP, ref.cast would take a type index immediate rather than an rtt value. So ToWebAssemblyValue(v, (ref $t)) would evaluate the cast (ref.cast $t v) and get you the same result you would get if you used an rtt.canon $t in the current proposal with RTTs.

lukewagner commented 2 years ago

@tlively Totally fair questions regarding prioritization; I'll leave that to you all. But regarding:

If the JS API instead encourages exposing complex internal data types directly, Binaryen will need to be told how they will be used via some other more complicated means to perform those optimizations.

I wouldn't imagine the JS API would encourage exposing any more accessors/mutators than otherwise: It seems like the surface area of the interface exposed to JS would be determined independently of the implementation mechanism used (whether fancy wrapper-free JS API or the JS wrapper objects you can do today).

@kripken

think I meant something equivalent to the solution you describe, in which at struct.new time the important JS info is known and used. The only difference is that I envisioned a JS API that sets that info up in advance for all struct.new operations in the module

It seems like, for a general wrapping story (like I've seen implemented in some linear-memory-language contexts already), you want to allow many different source-language classes/types to be exported to JS, each getting their own prototype with its own methods/accessors. In this scenario, you'd need the ability to control which prototype object to use for each individual struct.new. I suppose you could imagine passing the JS API an array of (bytecodeOffset, prototype) pairs... but that seems a bit gnarly.

kripken commented 2 years ago

@lukewagner

Interesting! To make sure I follow, given this:


(type $java.lang.Foo ..)

(func
    ..
  (struct.new $java.lang.Foo) ;; location 1
  ..
  (struct.new $java.lang.Foo) ;; location 2

It sounds like you want location 1 to use one JS prototype, and location 2 another?

That's an interesting amount of flexibility. AFAIK all the things I've seen would be fine with both of those locations using the same JS prototype, since they have the same wasm type, but I'd be curious to hear more.

(If we need this then we need this, but the more static our type story is the better it will optimize, which is why I'm concerned here.)

conrad-watt commented 2 years ago

If we end up needing to attach JS-specific information to individual occurrences of struct.new in a world without explicit RTTs, I think this should be done through a custom section rather than trying to create a user-level JS API for it - I'd be more happy to have the debate about expanding the "power" of custom sections than to see a JS API that works directly on byte offsets.

That being said, I'd somewhat prefer keeping explicit RTTs over the above (I was thinking about what it would look like to design an explicit struct.new_with_extra_host_info instruction - I think this ends up looking similar to explicit RTTs), although I wouldn't push hard if there's a toolchain/engine narrative for removing them.

FWIW I don't think this design question is inherently JS-specific, although it's obviously the main host we need to make sure we support. Any potential host that wants to treat the Wasm-level type as defining an object layout and expose Wasm objects to user code through some "regular-source-object-like" view (embedding Wasm in JVM?) probably wants a similar capability to attach richer (implicit, if necessary) RTTs to Wasm objects which carry host-specific info.

rossberg commented 2 years ago

Many good points have been made in this thread. Let me elaborate on the stated “1:1 correspondence” between static and dynamic types (in the new type system), since there were some questions about it. There is a bit more nuance to it.

Every RTT value with the same static type denotes the same Wasm type. That implies that they all behave the same wrt casts, and all desirable static optimisations can be applied to casts as if RTTs were left implicit (since, in fact, the concrete object does not matter there).
However, multiple RTTs for the same static type may produce different results wrt object creation. While rtt.canon always produces the same RTT value, different versions may be produced e.g. through the API (and possibly, later language extensions). That of course is the point, as it allows customising object creation.
RTTs are explicit so that their presence, their creation, their (re)use, and (in conjunction with type imports or generics) their transfer can be controlled and optimised by a producer. Since creating and passing RTTs has a runtime cost, in both space and time, that should be surfaced in a low-level language.
- For example, the same RTT may be used for multiple object creations. Or objects may be created from multiple choices of RTTs. Hence it is desirable (in fact, necessary for customisation) that creation of an object is decoupled from creation of a “meta” object shared between many objects.
- Similarly, there may be static types for which no RTT is created (or passed). Or there may equivalent types for which multiple different RTTs are imported (e.g., because they represent different source types). Hence it is desirable that the definition of a static type and creation of a runtime object representing it are decoupled.
All the above continues to hold with future type imports or generics, except that both static and dynamic types may depend on type and RTT parameters or imports. Creating RTTs for types involving parameters requires RTTs for these parameters to be given. The “1:1” property still persists, though, so even casts involving local type parameters can be optimised locally.

This should be sufficient reason to decouple RTTs from both type definitions/imports and from object creation. I think it is desirable to have a more fine-grained and explicit cost model than just “struct.new allocates”, even for the MVP.

Take the API problem as additional evidence that this decoupling is adequate. ;)

As for the API question itself:

I very much agree that the JS API should follow the language, not vice versa. However, this problem is not just specific to the JS API, and the desire to customise the host-side appearance of objects has been a prominent point (even though I admit that I could live without it, personally).
Our plan of record is to employ RTTs for this purpose, whether now or later. It’s fairly simple, general, and clear how it would work. It would be unwise to dump that without an equally clear story for a replacement.
AFAICT, alternative ideas that have been suggested would in fact be more complicated and/or intrusive to core Wasm, not less (e.g., involving cross-cutting extensions to the static type system). Or they would be specific to one embedding (custom sections) and thus be incomplete.

And with respect to cost and optimisations:

All features and future GC extensions are supposed to be pay-as-you-go, and so is their interaction with RTTs. Having RTTs explicit is key to preserving that down the road, and with it optimisability.
For Wasm we have always taken the stance that optimisations should be performed – and therefore be expressible! – by producers wherever possible. That is what I mean by keeping the cost “model” explicit (to the extent we can).
Producers should not need to rely on non-local optimisations being performed by engines, since they can’t assume that all engines will perform them. Moreover, some optimisations can be too costly to scale to more general cases in an engine, while the same cost is no problem for a producer.
As far as I can see, with the “1:1” property described above, casts with explicit RTTs are no less optimisable than implicit versions would be. Consequently, the fear that we’re missing out on optimisations if we don’t provide RTT-less alternatives seems unfounded at this point. That problem existed before, but it has vanished with the removal of rtt.sub.

A couple of individual responses:

@manoskouk:

these optimizations are already done (at least in V8) as a special case of load elimination

It is fine if some engines do additional optimisations. But as mentioned, producers should not have to rely on them to produce reasonable code.

When it comes to globals, I do not think reusing them is easier than rtts: Since canonical rtts are global to the module, they need to be loaded from the instance just as globals, therefore the same optimizations are required to eliminate multiple loads thereof. In fact, mutable-global loads are harder to eliminate than (immutable) rtt.canon loads.

Globals for RTTs can be defined as immutable just fine, since rtt.canon is a constant instruction. On the other hand, canonical RTTs are not tied to a module instance, they are global to the entire execution context. Comparing them to global loads from an instance is presuming a specific implementation model, I believe. In general, producers have to assume that executing rtt.canon can involve much more than just a load, so it’s better if they can already optimise it on their end.

Anyway, sorry for the longish response.

jakobkummerow commented 2 years ago

@rossberg

Every RTT value with the same static type denotes the same Wasm type. That implies that they all behave the same wrt casts, and all desirable static optimisations can be applied to casts as if RTTs were left implicit

I don't see that happening. The fast runtime subtype checks we currently have depend on each RTT storing its list of supertype RTTs. That only works when there is exactly one RTT per subtyping level (as is currently the case, thanks to having only rtt.canon and no rtt.sub, or more generally: no custom RTT generation). If runtime checks need to consider several RTTs as equivalent, they will become slower -- and fast checks are widely considered important.

It’s fairly simple, general, and clear how [employing RTTs for customising the host-side appearance of objects] would work.

I disagree. For instance, there are fundamentally two options for a hypothetical rtt.attach_host_info instruction: it can either modify the RTT in-place (which, among other issues, means that uncoordinated uses of it could overwrite each other's effects), or it can create a new RTT (which would destroy the 1:1 property, roughly comparable to rtt.sub's effect). I don't recall anyone ever having suggested a solution to this. Also, I am very much concerned that any imperative API (whether it's a Wasm-side rtt.attach_host_info instruction or a JS-side WebAssembly.CustomizeRTT(...) function) would make application startup prohibitively expensive. I strongly believe that we want some sort of declarative approach. All in all I don't think it's "clear" or "simple" how to design this API.

Globals for RTTs can be defined as immutable just fine, since rtt.canon is a constant instruction.

For now, they can; but when modules or their embedders can generate customized RTTs, then the globals used to store them can no longer be immutable. Aside from that, I agree with @manoskouk that there is no reason to assume that global.get is any faster than rtt.canon: both depend on how the engine implements them, both may perform lazy initialization and/or caching under the hood, both may or may not be benefitting from special compiler optimizations. Letting module producers perform optimizations is obviously desirable, I just don't think this qualifies as an example of that.

All features and future GC extensions are supposed to be pay-as-you-go

In my view, that is precisely why we should postpone RTTs. The MVP doesn't need them (it literally gets zero value from having them); and since making the model more flexible will have its costs, we should do that if and when we need it (and at such a time of course design it such that modules that don't care about the new flexibility can remain on the earlier fast path).

ajklein commented 2 years ago

Much of this discussion is devoted to how RTTs (or their absence) affect the JS API (and potentially other host APIs). Given that the JS API is going to be critically important for making this proposal usable on the Web, and thus a thing we'll need in some form for the MVP, I propose that we table the RTT discussion until we've done more work on the JS API. And when doing that work, we treat the existence of RTTs not as a given but as one option in the solution-space.

That's not to say there aren't other arguments for & against RTTs-in-the-MVP raised in this thread, only that without more shared agreement on what the requirements & constraints are for the JS API, discussions about RTTs in isolation are less likely to be fruitful. What's been laid out so far has been useful, and can be picked up down the road when we're clearer on the JS interop question.

rossberg commented 2 years ago

@ajklein, yes, makes sense to me.

@jakobkummerow:

I don't see that happening. The fast runtime subtype checks we currently have depend on each RTT storing its list of supertype RTTs.

Is that not easily fixed?

If I read the V8 code correctly, then its current representation of RTTs is (pseudo code):

type Rtt = Map
struct Map {
  typeinfo : TypeInfo
  ...
}
struct TypeInfo {
  supertypes : FixedArray(Rtt)
  ...
}

And then you compare object->map->typeinfo->supertypes[depth] with the target rtt, i.e., a map.

That seems to entangle maps with supertypes unnecessarily. What you can do instead is

  supertypes : FixedArray(TypeInfo)

Then you check object->map->typeinfo->supertypes[depth] against the target rtt->typeinfo. That separates plain type information from unrelated stuff in maps and makes the type check agnostic to type-irrelevant details of the map, like choices of prototypes. Type equivalence corresponds to equality of TypeInfo's, not RTT objects (cf. the nuance I was explaining earlier).

Would that not work?

I disagree. For instance, there are fundamentally two options for a hypothetical rtt.attach_host_info instruction: it can either modify the RTT in-place (which, among other issues, means that uncoordinated uses of it could overwrite each other's effects), or it can create a new RTT (which would destroy the 1:1 property, roughly comparable to rtt.sub's effect). I don't recall anyone ever having suggested a solution to this.

I am not sure where that instruction is coming from, but I agree that it wouldn't make sense to mutate RTTs, ever. So the solution would be for it to create a new RTT/map, and with the sketched representation that ought to work fine, no? As an extra plus, it wouldn't need to allocate a new supertypes array.

For now, they can; but when modules or their embedders can generate customized RTTs, then the globals used to store them can no longer be immutable.

In the case of the RTT being imported as a global, I don't see why it can't be immutable. In the case of using the hypothetical instruction above, I have no idea, since I don't know what its use case would be. (To be honest, that feels like a strawman. If we ever needed such a thing, then I expect it would be niche enough to not be particularly relevant.)

The MVP doesn't need them (it literally gets zero value from having them)

I don't think that's true, for the reasons explained and to avoid a future-hostile hole in the design.

PS: It probably was my mistake that the MVP doc describes a certain implementation of the supertype vector a bit too directly in terms of RTT values. That of course is only an approximation.

kripken commented 2 years ago

@rossberg

Our plan of record is to employ RTTs for this purpose [JS API / host customization], whether now or later. It’s fairly simple, general, and clear how it would work.

Do you remember where that was discussed? I don't feel I understand how this would work on the toolchain side yet (even after some offline discussions that answered some of my questions from earlier). I see we have some short notes on host types but that's the closest I can find - probably I've missed a discussion somewhere.

jakobkummerow commented 2 years ago

@rossberg Yes, modifying the supertypes list like that could work, but it would incur an extra chained load (rtt->typeinfo instead of just rtt). That may be an acceptable price to pay, but it would be a cost. Such a decoupling of JS types and Wasm types also reminds me of issues we previously had with structural vs nominal types: the JS side might be expecting a round-trip of sorts, i.e. objects coming back from Wasm to have the same prototype as those going in, but the Wasm side might be relying on ref.test, which wouldn't distinguish prototypes. Whether that's a problem or acceptable behavior remains to be discussed (as part of the JS API discussion, so we shouldn't get into that here) -- it's just one of the reasons why I don't consider it settled that "installing JS prototypes on Wasm RTTs" is definitely the way to go.

I am not sure where [the `rtt.attach_host_info] instruction is coming from

Total strawman sketch of one possible tool for attaching prototypes to RTTs. It could be a WebAssembly.AttachPrototype JS function instead; for the purposes of this discussion that's an unimportant detail: any design that wants to attach host info to RTTs must have some way to, you know, actually attach host info to RTTs. Questions like "should that be an in-place modification or should it create a fresh RTT?" arise regardless of what way exactly that is.

In the case of the RTT being imported as a global, [the global could be immutable]

True, however that approach would require a bunch of JS code to get executed before the module can be instantiated. While (AFAIK) nobody has even approximate numbers on the cost of this (especially for large real-world modules) at this time, I am worried that it might be unacceptably slow.

rossberg commented 2 years ago

the JS side might be expecting a round-trip of sorts, i.e. objects coming back from Wasm to have the same prototype as those going in, but the Wasm side might be relying on ref.test, which wouldn't distinguish prototypes. Whether that's a problem or acceptable behavior remains to be discussed (as part of the JS API discussion, so we shouldn't get into that here) -- it's just one of the reasons why I don't consider it settled that "installing JS prototypes on Wasm RTTs" is definitely the way to go.

Right. It's important to acknowledge that Wasm cannot be expected to maintain type invariants that lie outside it. That is a very general observation, we're just touching the tip of the iceberg here. The only way such "round-tripping" could ultimately be achieved would be by including all possible features of all possible host type systems into the Wasm type system to subsume their expressiveness, which clearly is infeasible. For example, if a host type system had generic types, then Wasm would also need generic types, with the exact same semantics.

Total strawman sketch of one possible tool for attaching prototypes to RTTs. It could be a WebAssembly.AttachPrototype JS function instead; for the purposes of this discussion that's an unimportant detail: any design that wants to attach host info to RTTs must have some way to, you know, actually attach host info to RTTs. Questions like "should that be an in-place modification or should it create a fresh RTT?" arise regardless of what way exactly that is.

Yes. For types, I would expect this to be "declarative" to the extent possible. In the case of the JS API, I would assume there will be constructors to define Wasm types, and these constructors can take additional configuration arguments. Incremental modification like you suggest might also work, as long as it's not mutation.

In the case of the RTT being imported as a global, [the global could be immutable]

True, however that approach would require a bunch of JS code to get executed before the module can be instantiated. While (AFAIK) nobody has even approximate numbers on the cost of this (especially for large real-world modules) at this time, I am worried that it might be unacceptably slow.

It is unclear what you are comparing this cost against. Surely, somewhere something has to happen in any approach, it's just shifting the work left to right, isn't it?

rossberg commented 2 years ago

@kripken:

Do you remember where that was discussed?

I believe it came up on a number of occasions, but the only pointer I have off hand is the discussion that @conrad-watt referred to above.

kripken commented 2 years ago

@rossberg Thanks! I had indeed missed that discussion.

(Reading it and the linked issues I still don't understand how the toolchain side would work, like how a bundler would merge modules safely, but maybe I'll open a new issue for that.)

titzer commented 2 years ago

I agree with @lukewagner that RTTs somehow need to be generative somewhere--they do cannot map 1:1 onto Wasm types. That's critically important to allow source languages to piggy back source-casts (and typecases) on Wasm casts. In the above discussion, it seems like all discussion of rtt.fresh has been left out. I think the discussion is incomplete without that consideration.

jakobkummerow commented 2 years ago

It is unclear what you are comparing [the cost of preparing RTTs that will be imported as globals] against. Surely, somewhere something has to happen in any approach, it's just shifting the work left to right, isn't it?

To clarify: I'm comparing against any kind of declarative solution, which would allow engines to create the required RTTs and related internal structures lazily on demand, instead of batching it all up on the critical path before the Wasm module's first function ever executes. (FWIW, a "no-frills" solution wouldn't have to do any of this work.) We know that large modules may well have 100,000 internal types. We don't know how many of those will typically need to be exposed to JS -- even if it's only 1% of them, that'd still be a sizeable number of prototypes to set up.

tlively commented 2 years ago

@rossberg, you wrote:

Every RTT value with the same static type denotes the same Wasm type. That implies that they all behave the same wrt casts, and all desirable static optimisations can be applied to casts as if RTTs were left implicit (since, in fact, the concrete object does not matter there).

However, multiple RTTs for the same static type may produce different results wrt object creation. While rtt.canon always produces the same RTT value, different versions may be produced e.g. through the API (and possibly, later language extensions). That of course is the point, as it allows customising object creation.

If I understand correctly, you are envisioning that in a future where multiple distinct RTT values can denote the same Wasm type, there will be no difference between those RTT values with respect to casting. Is that right? So optimizations on casts that today assume RTTs correspond 1:1 with Wasm types would still be valid in the future? If we can agree on that now, then I would be more amenable to including RTTs in the MVP.

But that understanding is at odds with @titzer's stated desire to have generative RTTs that can be used to piggy-back source-level casts. If we can't agree now on whether RTTs should be generative with respect to their interaction with casts, then we should punt on RTTs entirely to avoid being blocked on that decision and possibly having GC-MVP programs regress in optimizability in the future. We won't be able to gather performance data to inform that decision until we start working on post-MVP proposals that would introduce new sources of RTT values.

rossberg commented 2 years ago

@tlively, I would definitely want to maintain this invariant moving forward, since it improves the coherence of the type system (static vs dynamic), which is a desirable property in itself. I don't think we'd want generative RTTs as its own mechanism in the current design. Instead, if there is a strong need for generative types, we'd introduce generative type definitions, whose rtt.canon then likewise is a distinct type. That's all that's needed for the use case Ben describes. See also my reply to @kripken.

tlively commented 2 years ago

I've proposed an agenda item to discuss this at the May 3 subgroup meeting: https://github.com/WebAssembly/meetings/pull/1013

tlively commented 2 years ago

If we maintain the 1:1 correspondence between static types and RTTs even in post-MVP extensions, then isn't it true that RTTs will always be semantically redundant with static type annotations? So not only could we remove RTTs from the MVP, there also would never be any reason to reintroduce them.

rossberg commented 2 years ago

@tlively, no, not so. Because static types will (hopefully soon after MVP) involve components that are not known at compile time, either type imports or generic type parameters to functions. In those cases, you'll still have a 1:1 correspondence, but you cannot construct the RTT implicitly – unless engines implicitly pass RTTs around almost everywhere, which is highly undesirable.

As a I've pointed out in the past, that is the most important reason for having explicit RTTs: to avoid a hidden type passing semantics once we add those features. Because that would introduce inherent complexity and substantial hidden runtime costs (including non-trivial allocations at generic call sites). Thus RTTs are crucial later, and they are important now for proper forward compatibility.

If we defer them now, we'd have to introduce extra versions of all the relevant instructions later, and impose odd restrictions on the RTT-less ones, both of which would be ugly. In brief, it'd be shooting a big hole into the design.

tlively commented 2 years ago

unless engines implicitly pass RTTs around almost everywhere, which is highly undesirable... because that would introduce inherent complexity and substantial hidden runtime costs (including non-trivial allocations at generic call sites).

I think this is the key point and I'd like to understand it more. It has previously come up that explicit RTTs could allow compilers to better express how the RTTs would be loaded and register allocated, but the feedback from implementers was that this wouldn't really be useful since engines do their own register allocation from scratch.

I know that in the future when we have type parameters to functions, the idea is that explicit RTTs corresponding to those type parameters would be passed as arguments. With implicit RTTs, the engine would add those arguments implicitly based on the type parameters to the function. For generic type imports the engine would similarly generate implicit RTT imports based on the static type imports.

Are there any other situations in which you expect the engine would have to implicitly pass RTTs around if we got rid of explicit RTTs?

It's not obvious to me that this implicit passing is undesirable. I expect that in the vast majority of cases explicit RTTs would be used to express the exact same passing schemes that the engine would have generated anyway. I don't see any intrinsic value in duplicating the static immediate/parameter/import as an additional value-typed immediate/parameter/import, since the former should make it clear that something is being passed. Is there more to this that I'm not seeing?

rossberg commented 2 years ago

With implicit RTTs, the engine would add those arguments implicitly based on the type parameters to the function.

Right. The problem is that an engine generally needs to have a fixed calling convention. That implies that by default, the engine would have to pass RTTs to all generic functions, regardless of whether they need them or not. That would be terrible! An engine could possibly optimise those away in some limited cases where a function doesn't escape, but in general it is almost impossible to avoid that overhead.

And this overhead isn't just passing RTTs. Worse, it involves allocating and constructing RTTs at call sites with somewhat unbounded cost! For example, when a generic function calls another generic function with a type involving its own type parameters, then it cannot be pre-allocated:

(type $t (typeparam $X $Y) ...)

(func $f (typeparam $A) ...)

(func $g (typeparam $B)
  (call $f (type $t (typeparam $X) (typeparam $X)) ...)
)

Here, the call to $f requires an RTT for A that corresponds to $t<$B, B>, which will likely involve non-trivial computation and allocation that it must perform on every call. Modulo inline caches it could introduce. Both is very much against Wasm's spirit of "predictable cost".

Now, unlike an engine, a producer will have much more systematic knowledge about where casts are introduced and hence RTTs are needed in its compilation scheme. In fact, some producers may have no need for casts at all once we provide generics! So if RTTs are properly manifest, producers can systematically control and minimise their construction, caching, and passing for their use cases. Also, they can afford much more effort for expensive global analysis to optimise them.

This separation making the operational manifestation of runtime types explicit is pretty much standard, one way or the other, in principled designs for low-level languages combining runtime typing with generics [e.g., 1, 2, 3, 4]. The primary counter example would be the CIL (counting that as low-level), whose reified generics are notorious for the engine complexity they induce and probably inspired some of the cited work to do better.

conrad-watt commented 2 years ago

Ah I didn't think about generic functions calling each other. That does make the possible overhead seem more "real".

Returning to the question of whether we could remove RTTs for now and add them in back later, could we make typeparam a separate per-function index space, so that future with-RTT instructions could (optionally?) reference this index space while the hypothetical-MVP "without-RTT" instructions would be prevented from doing so, and therefore still guaranteed to reference real type declarations?

jakobkummerow commented 2 years ago

If explicit RTTs are required for generics, then we should introduce generics and explicit RTTs at the same time. Both now, or both later.

(Yes, adding (ref.cast (rtt $T) ...) after (ref.cast <$T> ...) will lead to duplication of sorts, but the existing implementations in Binaryen and V8 demonstrate that this is acceptable. In fact, if we resolve #274 by allowing concrete/custom types on the generic as/is/br_on instructions, there'll be no duplication to speak of.)

tlively commented 2 years ago

After the latest discussion, it's clear that it will make sense to have RTTs in their current form once we introduce type imports/exports, generics, or potentially other post-MVP extensions. However, it remains the case that RTTs would have no benefit in the MVP, and in fact would come with a 6% code size penalty (as measured on Dart by @askeksa-google). This code size penalty would not go away or pay for itself in the future since some toolchains such as J2CL will never use post-MVP features that would benefit from having RTTs.

Removing RTTs from the MVP and introducing them later also does not have any cost beyond maintaining RTT-free and RTT-using versions of cast and allocation instructions. This cost is trivial not only in implementations but also in the spec, where the RTT-free instructions would be defined in terms of their RTT-using variants composed with rtt.canon. Notably, this composition of instructions is precisely what gc-MVP toolchains would emit for every cast or allocation if we did mandate the use of RTTs in the MVP.

Although I would welcome new information to the contrary, removing RTTs from the MVP would decrease MVP spec complexity without affecting future spec complexity, would have small but significant code size benefits, and would be in line with our philosophy of incremental development, especially since we have no clear timeline for working on generics or type imports.

Unless anyone can identify any concrete problems with removing RTTs from the MVP and adding them back in later, I would like to wrap up this discussion no later than our meeting on May 31.

tlively commented 2 years ago

Returning to the question of whether we could remove RTTs for now and add them in back later, could we make typeparam a separate per-function index space, so that future with-RTT instructions could (optionally?) reference this index space while the hypothetical-MVP "without-RTT" instructions would be prevented from doing so, and therefore still guaranteed to reference real type declarations?

I think this would work and I would be fine with that, but it might be even simpler to allow RTT-free instructions to work even with type parameters, where the implicit rtt.canon might require the engine to allocate. That would maintain the invariant that (rtt-free-inst $T) == (with-rtt-inst $T (rtt.canon $T)). I am assuming that (rtt.canon $T) works even when $T is a type parameter since those RTTs need to come from somewhere initially, but maybe that's a bad assumption, in which case I agree that (rtt-free-inst $T) should be disallowed by construction as @conrad-watt suggests.

tlively commented 2 years ago

After recent discussions, it is clear that the most active stakeholders agree on the technical details around this decision but will not be able to unanimously agree on a path forward. To settle this issue one way or the other and move on to other things, I've scheduled a consensus poll on this for our meeting on Tuesday.

The proposed question to poll for is "should we defer RTTs from the MVP and reintroduce them alongside generics, type imports, or another post-MVP proposal?" If the poll does not demonstrate consensus in favor of deferring RTTs, then we will keep them as currently proposed in the MVP.

tlively commented 2 years ago

After a good discussion this morning, we successfully polled for consensus to defer RTTs from the MVP with the precise text given above. I'll leave this issue open until we can update the proposal docs to reflect this change.

tlively commented 2 years ago

We landed https://github.com/WebAssembly/gc/pull/306, so closing this.