WebAssembly / gc

Branch of the spec repo scoped to discussion of GC integration in WebAssembly
https://webassembly.github.io/gc/
Other
996 stars 71 forks source link

Pushing i31ref to post-MVP #320

Closed eqrion closed 1 year ago

eqrion commented 2 years ago

Are any production languages going to require i31ref in the MVP? If not, I think we should consider pushing it to a separate proposal after the MVP.

SpiderMonkey doesn't have any concerns regarding the feature's implementability. However, there were extensive discussions surrounding the suitability of i31ref for language ports, and I'd like to see real experience with the feature before committing to a tagging scheme in the MVP.

titzer commented 2 years ago

i31ref will be heavily used by OCaml.

eqrion commented 2 years ago

i31ref will be heavily used by OCaml.

Sure, no doubt there are some languages that will require some sort of feature here. I believe that's clear.

Does OCaml have a relatively-complete port to use Wasm-GC at this time? I'm asking this to be sure that i31ref has been fully validated in a real implementation, and that there's intent for it to be seriously used on around the timeline of a GC-MVP.

askeksa-google commented 2 years ago

As I see it, there are several mismatches between the common use cases for i31 and the way it is currently specified.

Some of the most common use cases that I have seen described throughout the discussions are:

  1. As the representation of integers in a uniform representation (the OCaml use case, AIUI)
  2. As some of the variants of an ADT, with other variants represented by structs.
  3. As the representation of small integers in a language with objectified integers, with larger values boxed into structs (the Dart use case).

Common to all of these use cases is that the i31 occurs in a union with other values. This is essentially what i31 is good for. But the current type system does not reflect this capability, leading to extra casts.

As described in that post, even with a union type, boxing and unboxing operations in the Dart use case are rather clunky. As these operations are very common, it's important that they are small and well optimizable by the engine. A way to achieve that would be to have fallible conversions from i32 and i64. These would be a set of 8 branch instructions:

br_on_i32_to_i31_s
br_on_i32_to_i31_u
br_on_i64_to_i31_s
br_on_i64_to_i31_u
br_on_not_i32_to_i31_s
br_on_not_i32_to_i31_u
br_on_not_i64_to_i31_s
br_on_not_i64_to_i31_u

If the value of the integer fits inside the range of i31, the instruction branches (or not) and leaves an i31 on the stack. Otherwise the integer is left on the stack.

Dart int boxing would then simply be (with the uncommon case delegated to a function):

(block $b (param i64) (result (ref i31 $BoxedInt))
  (br_on_i64_to_i31_s $b)
  (call $box_to_struct)
end)

A similar situation (though not as dramatic) occurs for unboxing. The combination of br_on_i31 and i31.get_s is largely redundant, since the only sensible thing to do after checking that a value is an i31 is to convert it. Ideally, we would have instructions mirroring the fallible conversions:

br_on_i31_to_i32_s
br_on_i31_to_i32_u
br_on_i31_to_i64_s
br_on_i31_to_i64_u
br_on_not_i31_to_i32_s
br_on_not_i31_to_i32_u
br_on_not_i31_to_i64_s
br_on_not_i31_to_i64_u

which would branch (or not) on i31 (doing the conversion) and otherwise leave the input value on the stack. These would shave off 5 instructions (block, br, i31.get_s, i64.extend_i32_s, end) for every unboxing.

This may look like another case of "please add the macro instructions that I need for my language", but I really do think it's more general than that, and that it caters to all common uses of i31. Also, the way i31 is usually implemented, it's natural to combine testing and encoding/decoding, and there are opportunities for sharing code between them. I think the instruction set should reflect that property.

What I'm really getting at here is that there is a design space for making i31 a lot more useful than it is now. And that the currently specified version is potentially incompatible with such more useful versions.

For this reason, I'm opposed to including i31 in its current form in the MVP.

titzer commented 2 years ago

As an aside, we could cut down on the signed/unsigned duplication by specifying that i31 values are unsigned and offering i32.extend31_s and i64.extend31_s, as we did for 8 and 16-bit integers after MVP.

askeksa-google commented 2 years ago

As an aside, we could cut down on the signed/unsigned duplication by specifying that i31 values are unsigned and offering i32.extend31_s and i64.extend31_s, as we did for 8 and 16-bit integers after MVP.

That would reintroduce the need for the outer block, which the proposed instructions avoid, thus negating the 5-instruction saving (well, 4 of them). The test, conversion and extension need to all happen in one step (on the branch edge) to get that benefit.

titzer commented 2 years ago

I was reasoning that br_on_not_i31_to_i32_s/u would leave an i32 on the stack, which, if simplified to only be unsigned, would need only a single extension. Likewise with the other branches; you'd just need an extension at the target label.

rossberg commented 2 years ago

@askeksa-google:

Some of the most common use cases that I have seen described throughout the discussions are:

  1. As the representation of integers in a uniform representation (the OCaml use case, AIUI)
  2. As some of the variants of an ADT, with other variants represented by structs.
  3. As the representation of small integers in a language with objectified integers, with larger values boxed into structs (the Dart use case).

As I keep pointing out, the most common use case actually is:

  1. As the representation of small scalars in contexts requiring a reference.

Such scalars include bools, int8, int16, chars, enums, etc. Furthermore, if the language is typed, then none of these use cases involves a union, except in polymorphic contexts, which form one huge union.

Languages that want it for that purpose include:

  1. languages with a uniform representation by default (most polymorphic languages),
  2. languages with non-type-specialised generics, i.e., requiring a uniform representation for values passed to generics (e.g., Java-style generics when not limited to objects, C# until Wasm has an internal JIT interface)

As I showed in my presentations, Waml and Wob employ i31 extensively for these purposes.

Common to all of these use cases is that the i31 occurs in a union with other values. This is essentially what i31 is good for. But the current type system does not reflect this capability, leading to extra casts.

No, see above. In a statically typed language and for use case 0, you should never need to extract a scalar directly from an anyref/eqref. For example, Waml and Wob never do. Wherever they consume a scalar, it's already typed i31ref.

This may look like another case of "please add the macro instructions that I need for my language"

Indeed. :)

What I'm really getting at here is that there is a design space for making i31 a lot more useful than it is now. And that the currently specified version is potentially incompatible with such more useful versions.

I don't understand how you arrive at that conclusion. Any of the instructions you mention could be a Post-MVP extension. So the current design seems perfectly in spirit of an MVP.

askeksa-google commented 2 years ago

I was reasoning that br_on_not_i31_to_i32_s/u would leave an i32 on the stack, which, if simplified to only be unsigned, would need only a single extension. Likewise with the other branches; you'd just need an extension at the target label.

The issue is that the target label is a join between the value converted from i31 and another value, which should not be extended (e.g. the value loaded from the box struct). If the i31 path has extra instructions after the target label, then the non-i31 path needs to jump around those extra instructions, necessitating the extra block.

For illustration, this is what unboxing of a Dart int would look like with the full-fledged instructions:

(block $b (param (ref i31 $BoxedInt)) (result i64)
  (br_on_i31_to_i64_s $b)
  (struct.get $BoxedInt 1)
end)
tlively commented 2 years ago

If i31 is primarily designed for use in languages that are not helping validate the MVP, then making i31 a post-MVP extension makes a lot of sense. Alternatively, we should design i31 so that it is useful to languages with active plans to target WasmGC and make any additional functionality useful for other languages a post-MVP extension.

askeksa-google commented 2 years ago
  1. As the representation of small scalars in contexts requiring a reference.

This looks to me like a general description of the purpose of i31ref which covers all of the use cases I mentioned. What characterizes a use case is why the context requires a reference.

  1. languages with a uniform representation by default (most polymorphic languages),

Right, so more precisely, you are talking about polymorphic languages without the ability to inspect the runtime type of values, i.e. either a value is fully opaque or its precise type is known statically. Is this what you mean?

  1. languages with non-type-specialised generics, i.e., requiring a uniform representation for values passed to generics (e.g., Java-style generics when not limited to objects, C# until Wasm has an internal JIT interface)

Yes. Dart is in the same camp.

For both of these categories, indeed a union type is only beneficial if all of the non-i31 types in the union have a common supertype that has capabilities on its own (such that code paths that only need that capability can replace a cast by an i31 check). This is the case for Dart, for instance, and I expect it would also be for Java and C#, where anything that is not an i31 would be of a common Object struct that has several capabilities on its own (inspect the type, read the identity hash, call methods).

No, see above. In a statically typed language and for use case 0, you should never need to extract a scalar directly from an anyref/eqref. For example, Waml and Wob never do. Wherever they consume a scalar, it's already typed i31ref.

I am curious about the practical reasons why most such use cases can't just use i32. I can see some potential reasons they couldn't (function subtyping, covariantly typed fields), but these seem to only apply in special cases.

For comparison, dart2wasm has a uniform representation which includes boxed bool, int and double, but whenever these types occur in a context that is not forced to use the uniform representation, they use the unboxed types i32, i64 and f64, respectively.

What does such situations look like in Waml and Wob?

What I'm really getting at here is that there is a design space for making i31 a lot more useful than it is now. And that the currently specified version is potentially incompatible with such more useful versions.

I don't understand how you arrive at that conclusion. Any of the instructions you mention could be a Post-MVP extension. So the current design seems perfectly in spirit of an MVP.

What I am primarily worried about is that if we specify i31 as currently and later want to add unions, the type rules for instructions might not generalize to the rules that we'd want at that point. Though if it's only a matter of instructions, I guess in the worst case we'll need to add new instructions with the desired behavior.

tlively commented 2 years ago

I've put this on the agenda for our subgroup meeting tomorrow.

eqrion commented 2 years ago

I will be traveling tomorrow and so I won't be able to attend. I'll try to state my opinion here then.

In the past, we have standardized and shipped some proposals which had subsets that ended up not being used in practice. Specifically, with reference-types I've seen data that indicates externref is not seeing any real use in the browser. That can definitely change in the feature, but it has been years since we shipped that.

My concern here is to avoid a similar situation with i31ref, due to the complexity cost of introducing tagging in our engine and risk of incompatible changes being requested when a language does attempt to use some sort of tagging.

I don't have a strong opinion for or against the current design of i31ref. It seems like a reasonable feature, and hearing of a serious language port using it effectively is really the only thing I'm looking for.

This is sort of a timing issue. If we really want this proposal to move forward in standardization quicker than we can get this data, then I believe it should be in a separate proposal (which I hope could move quicker than this one has). If we can get this data quick enough, then I support leaving it in this proposal.

tlively commented 2 years ago

The V8 team holds the same opinion, so it will be well represented at the meeting tomorrow 👍

rossberg commented 2 years ago
  1. As the representation of small scalars in contexts requiring a reference.

This looks to me like a general description of the purpose of i31ref which covers all of the use cases I mentioned. What characterizes a use case is why the context requires a reference.

Yes, but the point is that scalars are a broader use case than just source language integers or enums, which were the only cases you listed. As for the why, that's what the two (sub)points below meant to expand on.

  1. languages with a uniform representation by default (most polymorphic languages),

Right, so more precisely, you are talking about polymorphic languages without the ability to inspect the runtime type of values, i.e. either a value is fully opaque or its precise type is known statically. Is this what you mean?

Yes, which is common for typed languages. Even those with casts typically limit casts to object-like values, while others are opaque (not even all OO languages make everything an object).

  1. languages with non-type-specialised generics, i.e., requiring a uniform representation for values passed to generics (e.g., Java-style generics when not limited to objects, C# until Wasm has an internal JIT interface)

Yes. Dart is in the same camp.

For both of these categories, indeed a union type is only beneficial if all of the non-i31 types in the union have a common supertype that has capabilities on its own (such that code paths that only need that capability can replace a cast by an i31 check).

That may be so in Dart or similar everything-is-a-sort-of-object languages, but is not the case outside that camp. E.g., Waml is not like that, nor won't be OCaml, Haskell, or similar languages with a strong need for uniform representation.

No, see above. In a statically typed language and for use case 0, you should never need to extract a scalar directly from an anyref/eqref. For example, Waml and Wob never do. Wherever they consume a scalar, it's already typed i31ref.

I am curious about the practical reasons why most such use cases can't just use i32. I can see some potential reasons they couldn't (function subtyping, covariantly typed fields), but these seem to only apply in special cases.

Because polymorphism is very expressive in these languages. You can always type-abstract over everything visible after the fact, including over deeply nested type components. For example, I can pass a function of type int array -> int to another function with type ('a array -> 'a) -> t, so the compiler cannot represent int array as array i32. Similarly with everything else. Unless the compiler can see all use sites, it can basically never assume that a first-class value does never flow into a context where it is viewed polymorphically. Static type specialisation is not an option either, because polymorphism is first-class, i.e., there may be polymorphic call sites that don't statically know the definition of the called function (nor the type it's called at).

For comparison, dart2wasm has a uniform representation which includes boxed bool, int and double, but whenever these types occur in a context that is not forced to use the uniform representation, they use the unboxed types i32, i64 and f64, respectively.

In languages like the above, you can unbox the contents of second-class variables, but that's basically it. And it is less beneficial than you might think, because you often need to pass the value around, in which case repeated unboxing/boxing is more expensive than keeping it boxed in the first place – a value is typically passed and stored more often than actually inspected.

What does such situations look like in Waml and Wob?

For Waml, I made unboxing configurable separately for locals, globals, temps, closure environments, and pattern scrutinees. But everything is always boxed as part of other data structures, even when the type is fully known. That is necessary, see above.

Wob has explicit box types, somewhat similar to Java's Integer class and friends, except that they aren't objects. So the programmer is in control of boxing and unboxing. But only boxed types can instantiate generics.

What I am primarily worried about is that if we specify i31 as currently and later want to add unions, the type rules for instructions might not generalize to the rules that we'd want at that point. Though if it's only a matter of instructions, I guess in the worst case we'll need to add new instructions with the desired behavior.

I am positive that unions are perfectly compatible with the current design. In particular, i31 poses no more of a problem than any other type. That said, there are probably good reasons to have unions work with their own set of more explicit instructions anyway, esp to avoid the need for naive subtyping and the quadratic cost pitfall. (I have some thoughts on making unions ordered, but that's off-topic here.)

titzer commented 2 years ago

While I agree with what @rossberg wrote above w.r.t. to implementing polymorphism (though Virgil does whole-program static specialization and does not have polymorphic recursion or first-class polymorphism), the stronger argument is to represent tagged unions (aka sum types, algebraic data types, or variants). Another use case is to implement dynamic languages (like JavaScript!) on top of Wasm, using i31 as the small integer type.

I think all three of these use cases are really important. Having a form of tagged reference in Wasm is a major value-add over other statically-typed bytecode formats such as DEX, the JVM, and CIL.

RossTate commented 2 years ago

Since @rossberg has repeatedly claimed that polymorphic languages need i31ref and explicitly mentioned Haskell, I thought the group might appreciate knowing for today's discussion that the prevailing compiler for Haskell—GHC—does not use anything like i31ref. This documentation indicates that GHC uses 32/64-bit integers for its "machine-sized" integer type Int (and these are indeed boxed). On the other hand, this documentation indicates that GHC does use pointer tagging. In particular, for values of algebraic data types the low bits indicate whether the value has been evaluated (since Haskell is lazy) and, if so, which case the value belongs to. This lets many case statements avoid having a memory load block a branch. Other Haskell compilers use 30 bits for Int, again because they need to reserve a bit for pointer tagging to distinguish evaluated and unevaluated lazy values.

So, contrary to the claim, i31ref is not widely accepted among polymorphic languages and in fact is incompatible with the optimizations some polymorphic languages do actually heavily rely upon.

the stronger argument is to represent tagged unions (aka sum types, algebraic data types, or variants)

Many implementations of languages with algebraic data types use pointer tagging for this instead.

Another use case is to implement dynamic languages (like JavaScript!) on top of Wasm, using i31 as the small integer type.

Many implementations of dynamic languages use pointer tagging instead. And many implementations of dynamic languages with more emphasis on floating-point computation use NaN-boxing instead.

askeksa-google commented 2 years ago

I implemented basic i31 support in dart2wasm to get a data point for the discussion. It represents small int values as i31ref and does boxing and unboxing by calling functions that are implemented basically like I presented in the type dimension discussion. It also needs to check for i31 in many places, such as on method call receivers and type tests/casts. It bumps the type for the uniform representation to eqref.

For most benchmarks I did not see a measurable difference. For barista3 (big, method call-heavy benchmark) I saw a roughly 10% slowdown. For Richards, which is reasonably List<int> heavy, I saw a 10% speedup.

For barista3, the module size grew by 2% (combination of larger code and smaller constants). Other benchmarks varied from slightly smaller to about 1% bigger.

So it appears i31 has some potential for this use case, even in its current form, though its overall benefit is not clear-cut.

Update: The quoted numbers are with the module optimized by Binaryen, which manages to eliminate the i31 check from 90% of all method calls.

rossberg commented 2 years ago

@RossTate, it's correct that GHC does something special, mostly because of laziness, as described in their ICFP'07 paper. The authors also point out that they are the odd one out:

Many programming language implementations (SML/NJ, Ocaml, Squeak) use tag bits to differentiate pointers from non-pointers.

RossTate commented 2 years ago

Yes, those three languages use i31, and I never indicated otherwise. I just indicated that a proper discussion of this topic should include considering that many languages do not use i31 and that many languages use features incompatible with i31.

tlively commented 2 years ago

It turns out that @chambart and others are currently working on bringing OCaml to WasmGC and i31 as currently specified is working well for them. Since that resolves the core concern here, I think we can close the issue and keep i31 as it is. @eqrion, do you agree?

We're also pretty sure that we will be able to extend i31 to work better for small integer optimizations either via new instructions for boxing and unboxing integers or via a more general union mechanism post-MVP.

jakobkummerow commented 2 years ago

Reflecting on this decision, it has some noteworthy implications:

(1) i31ref support isn't free. It has a small but non-zero cost for every module¹, even if that module doesn't use i31ref at all. It doesn't follow the pay-as-you-go principle, which we usually consider very desirable.

¹ Qualifications:

(2) There's an obvious way how i31ref is supposed to be implemented, but it seems difficult (and maybe even undesirable?) to write formal spec text (and tests) to actually enforce this. Just how in a hypothetical spec without i31ref, module producers would have to manually box their scalars into single-field structs, an engine could decide to implement i31ref that way under the hood. That would leave as benefit of having i31ref that it makes the lives of certain module producers easier (which is not usually very high on our priority list); but (assuming such an engine is sufficiently popular) it would disappoint any "i31ref doesn't allocate" performance expectations, and hence nullify i31ref's other purpose, which is to improve efficiency of certain patterns. (Of course, this concern doesn't carry too much weight because similar things could be said about almost every feature: features typically offer room for optimizations, rather than prescribing certain optimizations by spec.)

(3) To summarize the decision process, it was essentially: "We should postpone this feature because nobody uses it." -- "OCaml does use it." -- "OK, then we'll keep it in the spec." In other words, we have just set a precedent of adding a feature that has a cost even for those who don't use it (see point (1)) just for the benefit of a single language (plus, of course, nebulous predictions about various other languages that might conceivably eventually also use it).

To be clear, I'm not arguing that we should revisit the decision. I'm aware that standardizing a platform is always a game of ugly compromises, and I'm just reflecting on this particular compromise.

rossberg commented 2 years ago

It is generally expected that the use of less precise types is less efficient than more precise types. Hence, a module that doesn't need its generality should certainly avoid anyref, regardless of i31ref (allowing to internalize externref can have a similar effect). That is, the proposal allows to avoid extra cost, though of course it does not prevent producers from generating suboptimal code instead – but that's universally true. Likewise with the possibility of crappy implementations in engines – by that line of argument, almost every new feature we had since 1.0 could be questioned.

As @titzer pointed out, the ability to unbox values is one of the unique selling points of Wasm GC. I predict that we will see plenty of use, because it is the counterpart to a technique widely used in native code (even if we cannot make it quite as flexible as native code).

eqrion commented 2 years ago

It turns out that @chambart and others are currently working on bringing OCaml to WasmGC and i31 as currently specified is working well for them. Since that resolves the core concern here, I think we can close the issue and keep i31 as it is. @eqrion, do you agree?

Apologies for the long delay and slowing down this discussion, this dropped off my radar. @tlively (or @chambart) is there any additional public information about this porting effort? Reading through the meeting notes, I didn't see anything.

Specifically, I'm curious about this line from the meeting notes: PC (Chat): Leo and I are working on this (OCaml) already. The current version of i31 works well, but don’t have performance numbers. Getting numbers by comparing to boxing all integers would be too much work..

I do think performance numbers are relevant here. We've dropped several advanced optimization features from this proposal because the expected improvement was likely not required to port languages initially. If OCaml could get away with boxing all integers, then deferring it from this proposal would be similar to how we've treated other features such as nested data structures and interior references. Both of those optimizations would also reduce allocations and remove indirections.

As @titzer pointed out, the ability to unbox values is one of the unique selling points of Wasm GC. I predict that we will see plenty of use, because it is the counterpart to a technique widely used in native code (even if we cannot make it quite as flexible as native code).

I agree that this would be a unique selling point for Wasm GC. I also understand the sentiment behind wanting to avoid the chicken and egg problem between language ports and wasm runtimes. However, we'll run into this with all post gc-mvp features, and the solution is for runtimes to keep implementing these features before full standardization and making them available for language ports to experiment with. Not standardizing and shipping first so that language ports can then experiment with them after.

titzer commented 2 years ago

However, we'll run into this with all post gc-mvp features, and the solution is for runtimes to keep implementing these features before full standardization and making them available for language ports to experiment with.

The issue is that we're far from having a well-functioning feedback loop with such prospective runtimes. The message from many language implementers has been that they're waiting for features to stabilize; partly for their own sanity in waiting out the inevitable churn. That might improve generally with time as we attract more languages with the features that we do ship, but I feel GC MVP definitely needs to be complete enough that they don't yawn and turn away even more pointedly.

tlively commented 2 years ago

@chambart should chime in as well if he wants, but it sounds like the OCaml effort is at a very early stage.

I think the comparison to other optimizations like nested structures is reasonable, but the key differences are that we have a fully fleshed out semantics and engine and tools implementations for i31. We could certainly make the MVP more minimal by dropping it, but given how far along the feature is and given the concrete evidence that it will be used, I think that dropping it now would not have any benefit. It wouldn't let us ship anything else sooner and we don't need more time to discuss the design of i31.

rossberg commented 2 years ago

I think the comparison to other optimizations like nested structures is reasonable, but the key differences are that we have a fully fleshed out semantics and engine and tools implementations for i31.

The other key difference is that nested structures are at least an order of magnitude more complicated than i31, both in terms of surface area, concepts, semantics, but probably also (efficient) implementation.

tlively commented 1 year ago

@chambart, I was talking to @eqrion and others about this the other day and we were wondering whether your OCaml port is using i31 for anything other than implementing OCaml integers. Are you using it to represent packed ADTs or anything like that as well?

zapashcanon commented 1 year ago

I'm the one working with @chambart on this. We are indeed using i31 for other things than implementing OCaml integers. We're compiling from the flambda representation of the compiler, where others things have already been translated to unboxed integers.

As we told you by email it's still at an early stage but we'll come back to you once we can present it at a gc meeting.

tlively commented 1 year ago

@eqrion. do you still want to discuss this further or should we close out this issue?

tlively commented 1 year ago

ping @eqrion

eqrion commented 1 year ago

I do wish to discuss this further, and in fact I was hoping to do some research and collect some data on this. Specifically, I wanted to run some OCaml-on-Wasm code and do some performance experiments, and see if that could help move this forward. I've been blocked for a couple weeks though as I have not heard back from @chambart since we emailed initially about getting some code samples. :)

I'm still a bit uncomfortable here with i31ref, as my understanding of the data we have here is roughly:

  1. Dart did not find a clear-cut benefit from using this.
  2. OCaml is using this for their integers and GADTs and it aids their codegen (presumably runtime performance too, but I was hoping to collect data on this).

OCaml though has the advantage that source level integers are 31-bit, so I'm concerned about generalizing this to other languages, where they might experience enough extra costs due to this mismatch that i31ref does not have a clear-cut benefit for them (Dart might be in this category).

What would help me here is to have data that i31ref will be useful outside of OCaml or data that we can generalize OCaml's experience to more languages.

fgmccabe commented 1 year ago

There is a third use case for i31ref: characters; which is actually a special case of a ADT (not GADT) where all the sum cases are enum symbols (e.g. dayOfWeek)

rossberg commented 1 year ago

@eqrion, i31ref is intended for any sort of small scalar. That includes unit, bool, char, int8, int16, enums, nullary constructors in datatypes, polymorphic variants, bitsets, etc. Ocaml uses it for most of these, and likewise will pretty much every other typed language depending on a uniform representation -- tagged words are a ubiquitous implementation technique in language runtimes. Source-level 31-bit integers are merely a rather exotic edge case that I wish people would not get so hung up on.

titzer commented 1 year ago

Also worth noting that i31 is helpful in accelerating dynamically typed languages like Python and JavaScript[1]. And yes, I do expect performant JS implementations on top of Wasm in the fullness of time.

[1] AFAICT, V8 is the only browser engine that still uses 31-bit SMIs rather than NaN-boxing. As it turns out, this technique works better with pointer compression, so it seems likely to stay indefinitely.

RossTate commented 1 year ago

likewise will pretty much every other typed language depending on a uniform representation

This is an inaccurate accounting of uniform representations in the wild. For example, most object-oriented languages with generics using a uniform representation do not use i31ref. The reason is that they have found it pays off better to have the v-table be part of the uniform representation rather than to pack small scalars (along with a class-identifying tag) and have to branch on calls to "common" methods (like "equals" and "hashcode" or "compareTo"). A number of functional languages have found pointer tagging to be more useful than i31ref. This is especially true for Haskell, where pointer tagging is critical to efficient implementation of laziness. It is also often true for dynamically typed languages like Scheme/Racket, where the pointer tag helps avoid loads on the fast path. Some research has found applications of pointer tagging in object-oriented languages as well, though I don't know of any industry adoption of such techniques yet.

Many runtimes employing pointer tagging do also employ something like i31ref but with fewer bits because i31ref is incompatible with pointer tagging. In fact, OCaml is an outlier in using 31 bitsOCaml is an outlier in guaranteeing users 31-bits of accuracy (rather than just 30 bits or a full 32 bits) in its integer type. So if you want something that would generalize to a lot more languages and post-mvp features, then i30ref would work better.

rossberg commented 1 year ago

i31ref is incompatible with pointer tagging.

No. Reserving one bit to distinguish ints does not imply anything about additional bits reserved in pointers. At worst you lose one bit. Additional pointer tag bits could well be a Post-MVP extension, except that it is rather unclear how to provide them in a manner that is portable, efficient, and useful at the same time.

In fact, OCaml is an outlier in using 31 bits

No. It obviously isn't the only size used, but neither an outlier. And from the set of others, you have to subtract runtimes that chop off additional bits merely for GC purposes, which is moot in the case of using Wasm GC.

RossTate commented 1 year ago

Sorry, I was imprecise with my wording of my statement about OCaml. I have corrected it to "OCaml is an outlier in guaranteeing users 31-bits of accuracy (rather than just 30 bits or a full 32 bits) in its integer type".

This is an important correction, because implementation techniques such as the one @titzer just mentioned would work roughly just as well for i30ref.

Reserving one bit to distinguish ints does not imply anything about additional bits reserved in pointers. At worst you lose one bit.

On 32-bit machines, there are typically only 2 bits, so losing one of them non-trivial. But that's not all. Let's dig into the concrete application of pointer tagging I mentioned for Haskell laziness. There, the check for "value or lazy thunk" is extremely common. It also applies to one when the value would be an integer. So you want a bit to be devoted to solely indicating value or thunk. If you have i31ref, then you can't do that. Instead, you have to first check if it's an i31 or not, and then you can check if it's a value or a thunk—you've added another branch to an extremely common operation. With i30ref, you can have both the "value or lazy thunk" bit (as a post-MVP extension) and packed 30-bit integers, and you can check "value or lazy thunk" without knowing whether or not it's a packed integer.

Note that because the GC proposal has a uniform top type for all wasm-GC references, you can't have some people use i30ref (with pointer tagging) and some people use i31ref; they all have to agree on the same representation scheme.


To the point of generalizing to more languages, I've given a quick sense of why i30ref would generalize better. Could someone provide examples of runtimes that would be able to use i31ref but not i30ref? The only example that comes to mind is OCaml, but again that's due to OCaml being an outlier in guaranteeing 31 bits rather than 30 or 32. (And, is it unreasonable for OCaml-to-wasm to only support 30-bit integers?)

titzer commented 1 year ago

While increasingly complicated pointer tagging schemes seem fun to muse about[1], the only real decision we have at the moment is do we ship i31 with MVP or not. If we don't, that forces boxing for every implementation situation described above[2]. That will make Wasm GC not performance-competitive with native implementations of any of the above features and takes away a critical value-add of Wasm GC over other VMs such as the JVM.

Again, back to the topic of the thread. Do we ship i31 in MVP or not? I think we must, because the cost of not shipping it is to punish a widely-used language that basically has already finished Wasm GC support, plus force all other language implementations to work around this lack, again, by boxing.

[1] Reserving 2 bits now isn't the only way to achieve pointer tagging; we could add binary unions of reference types in the future that are implemented with another invisible tag bit. In fact, because we already split off func from any, a union (presumably, struct) and thunks (presumably, func or closure) is the only way to introduce a common type for them without boxing. [2] with struct-level casts, which are far more expensive than tag checks.

eqrion commented 1 year ago

In my mind, this is a procedural question of what is the bar for new features. I've informally thought one requirement is that a feature is useful to more than one language. I also feel that this should be demonstrated empirically before phase 4, not just with logical arguments.

So I don't really disagree that i31ref could be useful for functional languages or that it could be used for dynamic languages. I just thought we'd need to see it actually used by more than OCaml before moving to phase 4, so we validate that we're not just creating something useful for one language. Sometimes features don't work out the way we'd expect them when people actually try to use them.

But, I could also see an argument that this is too high of a bar. The actual phase 4 requirement is just that one toolchain uses a feature, not multiple.

So if folks feel so strongly that this needs to be in the MVP, I'm open it. But I think there is a significant risk in not collecting data from at least one other user.

tlively commented 1 year ago

I am personally satisfied with OCaml’s usage of i31ref and think it merits inclusion in the MVP even if we don’t have other languages using i31ref yet. I think it’s important that we support a diversity of languages, even if that requires us to have features that are not used by some languages.

I do look forward to investigating how we can make i31ref more broadly useful post-MVP, though.

eqrion commented 1 year ago

I should also add that I am implementing i31ref in SpiderMonkey right now.

I know there was a concern that moving i31ref to a separate proposal would prevent us from having a good feedback loop with language producers, and I don't think that needs to be the case if we have two complete implementations that have it easily available for experimentation. It would also make an i31ref proposal able to move very quickly once we have another user. And I would personally support it moving forward quickly once that is the case.

tlively commented 1 year ago

FWIW it looks like @wingo is using i31 for small integers, bools, etc. in Scheme as well: https://wingolog.org/archives/2023/03/20/a-world-to-win-webassembly-for-the-rest-of-us

wingo commented 1 year ago

Right! More docs on how we use i31ref here: https://gitlab.com/spritely/guile-hoot-updates/-/blob/main/design/ABI.md#immediate-data-types

Of course it would be great to eventually have the sorts of bounds-checking operations with branches to enable faster, more compact code: i31.add.br_on_fail, i31.new.br_on_fail, that sort of thing. But already having the immediate data type there is useful for Scheme.

eqrion commented 1 year ago

Okay, having at least one more user here helps out. I'm okay with closing this issue then, unless anyone else has concerns.

tlively commented 1 year ago

Closing, but anyone should feel free to reopen for further discussion if they want.