Alternatives to i31ref wrt compiling parametric polymorphism on uniformly-represented values (OCaml)

As i31ref doesn't seem to be an unanimously-agreed-on (see https://github.com/WebAssembly/gc/issues/53) part of the GC MVP spec, I am very interested in discussing what the concrete alternatives to it are in the context of parametric polymorphism on uniformly-represented values. (I would appreciate if the answer doesn't immediately read as "use your own GC on linear memory".)

To give some (historical) context: Why does OCaml use 31-bit integers in the first place? Generally, it is possible, to have a model of uniform values where every value is "boxed" (i.e. lives in its own, individually allocated, heap block). Then, every value is represented by a pointer to the heap and can be passed in a single register when calling a function. A heap block always consists of a header (for the GC), and a sequence of machine words (values). From an expressiveness standpoint, this is fine. However, when even simple values such as integers are always boxed (i.e. require a memory access to "unbox" them), performance suffers. Design constraints for the representation of unboxed integers were: a) need to be able to pass unboxed integer values in a single register, and b) need a means for the GC to distinguish (when crawling the heap) whether a value in a heap block represents an unboxed integer or a pointer to another heap block, c) being as simple as possible for the sake of maintainability. In OCaml, the compromise between performance and simplicity that was chosen is to unbox integer values by shifting them left by one bit and adding one. Since pointers are always word-aligned, this made it trivial to distinguish unboxed integers from values that live behind heap pointers. While this is not the best-performing solution (because all integer arithmetic has to operate on tagged values), it is a simple one.

Note that there exist compilation targets of OCaml that use 32-bit integer arithmetic, and the OCaml ecosystem largely accounts for that. Having libraries also consider the case where integers have 64-bits seems feasible. Some code will get faster if we can use native 32-bit integer arithmetic.

Ideally, for the sake of simplicity, we would like to emit one type Value to WebAssembly, which represents an OCaml value, which is either:

a reference to a heap block
an unboxed value that fits into a machine register
a reference to some opaque value from the outside world (traditionally, for OCaml, this is C, but in WebAssembly, this could be either an anyref or some address in the linear memory of another module)

A heap block of OCaml traditionally consists of

a header word that holds a) a tag that gives some information about the block, b) GC color bits (obsolete with WASM GC), and c) the size in machine words of the block
a sequence of machine words, length of it being the size specified in the header.

The most trivial representation (i.e. the one matching most closely the existing one) that I see when I look at the MVP spec is an anyref array that holds both references to other heap blocks and i31ref values. So, from the viewpoint of having to do as little work as possible in order to compile to WebAssembly and keeping the implementation simple, i31ref is certainly looking very attractive for an OCaml-to-WASM compiler MVP.

In https://github.com/WebAssembly/gc/issues/53#issuecomment-546252669, @rossberg summarized:

For polymorphic languages, there are these [heap representations]:

Pointer tagging, unboxing small scalars

Type passing, unboxing native scalars, runtime type dispatch

Type passing, unboxing native scalars, runtime code specialisation

Boxing everything

Static code specialisation

From OCaml's perspective, I think that (2) and (4) don't seem acceptable as a long-term solution in terms of performance. Here, compiling to the WASM linear memory and shipping our own GC seems a more attractive choice.

So, that leaves (3) and (5).

(3) seems fairly complex. If the WebAssembly engine would do the runtime code specialization, or if we could reuse some infrastructure from another, similar language, it could be worthwhile for us to work with that. It currently seems unlikely that OCaml in general will switch to (3) in the foreseeable future, unless we can come up with a simple model of runtime code specialization. I expect that implementing runtime code specialization in a WebAssembly engine goes way beyond a MVP, so it seems unlikely this will happen.

(5) is simpler than (3) in the sense that we do not have to ship a nontrivial runtime. If we analyze the whole program in order to emit precise types (struct instead of anyref array) for our heap blocks on WebAssembly, we wouldn't need to use i31ref and we can reap the other benefits of whole-program optimization (e.g. dead-code elimination, operating with native unboxed values, no awkward 31-bit arithmetic). Still, this will be a sizeable amount of work (possible too much to do it right away). I also can't say how bad the size of the emitted code will be in terms of all the types we need to emit. Instead of emitting a single Value type, we need to emit one struct type for every "shape" of heap block that can occur in the program. To keep this manageable, we need to unify all the types whose heap block representations have the same shape. Then, static code specialization kills one nice feature of OCaml: separate compilation of modules. However, instead of doing static code specialization before emitting WebAssembly, maybe it is possible to implement a linker for our emitted WebAssembly modules that does code specialization at link time if we emit some additional information to the compiled WebAssembly modules? This kind of linker could possibly be interesting to other languages that are in a similar position as us, as well. Obviously, link time will be slower than we are used to. I haven't thought this through in detail at all. It seems likely that these issues are manageable, if enough effort is put into them.

Edit: while the previous paragraph sounds fairly optimistic, looking into whole-program monomorphization (turning one polymorphic function into several non-polymorphic ones) more closely, it is definitely not trivial to implement. Types that we would need for this are no longer present at the lower compilation stages. When I look at the MLton compiler (a whole-program-optimizing compiler for Standard ML), it seems that it is a good idea to monomorphize early, in order to be able to optimize based on the types of parameters. Features like GADTs, and the ability to store heterogenous values or polymorphic functions in hash maps (or other data types) do not make it simpler. It looks to me like this would mean almost a full rewrite of the existing compiler and it is not obvious whether every function can be monomorphized (without resorting to runtime type dispatch).

Are we missing something here, are there other techniques that we have been overlooking so far? Feel free to drop pointers to good papers on the topics in general, if you know some.

Also, I am very interested what perspective other languages with similar value representations have on this and whether there is interest in collaborating on a code-specializing and dead-code eliminating linker.

Quick update: as of just now, i31ref as described by the current proposal is fully implemented in V8's prototype, so you can use that for any experimental performance investigations/comparisons.

In case anyone has a demo where they suspect that significant (or at least measurable) time is being spent on taggedness-checks even though the demo doesn't use i31ref, it would be simple to create a custom build (or introduce a runtime flag) to turn off i31ref support and verify this suspicion. I won't do that until/unless someone asks me to though :-)

(Disclaimer: this is not a statement of opinion on whether i31ref should exist in the spec, or in what form. I find the flexibility of a generalized form like (tagged i31 anyref) appealing; I also like the simplicity of the current proposal (and believe that it doesn't incur an unreasonable cost at runtime); and I'd also be fine with not having it at all -- we can always add it later if needed.)

For example, primref might want to use the bits to distinguish at least funcref and structref values in order to permit fast casting. And anyref might want to use bits completely differently to support NaN boxing.

In other words, while tagged i31 primref might make boxing integers faster for OCaml (by not boxing them), it has the cost of making either casts to structref or casts to funcref slower for OCaml and for everyone else.

Ok, if I get this right, you say that having tagged i31 anyref in the spec prevents the engine from using the lower (unused) bits of a heap pointer.

I see two cases here:

In case an aligned heap pointer is stored by means of the type tagged i31 anyref, this is true. 32 bits are used, there is no way for the engine to embed additional information about the heap pointer in these 32 bits at runtime.
In case an aligned heap pointer is stored by means of the type anyref, the engine is still free to use lower (unused) bits in any way it sees fit.

Since we cannot type cast between tagged i31 anyref and anyref, at any point in program execution, the engine knows exactly (thanks to type annotation of the WASM program) which of these representations it is dealing with.

It looks to me like there is no difference for the GC walking the heap, when adding tagged to the current MVP proposal. It already needs to know the type of the heap struct it is walking, in order to know which fields may be pointers.

So, there are three entities here:

aligned heap pointer
value of type anyref
value of type tagged i31 anyref

Is there anything that prevents the engine from assuming different semantics for the implementation of

aligned heap pointer represented as tagged i31 anyref (cannot store information in lower bits), and
aligned heap pointer represented as anyref (can store information in lower bits)?

It looks to me like adding tagged instead of i31ref can only potentially bring a disadvantage to the users of tagged (who are likely to be willing to incur that because it enables reusing their existing compilers), but not to those who only use anyref and no tagged representations. In contrast, i31ref comes at a cost to everyone, but is less complex.

@gasche

we could think about having, for example, (tagged structref funcref)

This tagged-idea could in theory even be taken further:

tagged (ref $A) (ref $B) (ref $C) i30

tagged (ref $A) (ref $B) (ref $C) (ref $D) (ref $E) (ref $F) (ref $G) i61

So what you're describing is a heterogenous approach. Unfortunately, the current MVP is designed around a homogenous approach. For example, its compilation model for imported/exported types is designed around a universal representation. As such, you would not be able to use tagged i31 anyref as imported or exported types (without effectively changing anyref to bake in i31ref).

But if you want to go with a heterogeneous approach to enable custom low-level representations, then it's better to take things further than just tagged. For example, right now the coercion from tagged i31 anyref to anyref (with specialized low bits) would require a memory read to determine the low-bit information that was omitted from tagged i31 anyref to make room for the unboxed scalar. To prevent these kinds of efficiencies, you want a coordinated tagging scheme for your types. That's what the SOIL Initiative's proposal does.

For completeness, I should note that tagged i31 anyref can still have a cost even with your clarifications. Note that the coercion from tagged i31 anyref to anyref required a way to look at the heap contents to determine the low bits. That assumes the heap has that information. But some garbage-collection implementations don't put meta-information in the heap at all, at least for common small objects, relying on the fact that the meta-information is always available in the low-bits of the pointer. For example, if we were to add i32ref and i64ref, instead of a wasting a word in the heap for each of these objects just to say that they are i32ref and i64ref respectively, an engine could make sure that the low bits of every pointer to these objects tracks whether they are i32ref or i64ref (or other). But it can't do that if tagged i31 anyref exists, because there's not enough room in the low bits to track that information, forcing these values to take more information in the heap. (Since a funcref is a pair of a code pointer and a module-instance pointer, it too might be treated as a common class of small objects.)

the current MVP is designed around a homogenous approach. For example, its compilation model for imported/exported types is designed around a universal representation. As such, you would not be able to use tagged i31 anyref as imported or exported types

Thanks for bringing up the type imports spec, I hadn't read that in all details yet. :+1:

Okay, the type imports proposal says "As far as a Wasm module is concerned, imported types are abstract. Due to Wasm's staged compilation/instantiation model, an imported type's definition is not known at compile time." (https://github.com/WebAssembly/proposal-type-imports/blob/c9700ff6267571f4a52151c8a46e800f8534f923/proposals/type-imports/Overview.md)

One paragraph later it says "However, an import may specify a subtype constraint by giving a supertype bound with the import"

If I understand this correctly, this means that all type imports/exports must be subtypes of the type any ("the type of all importable (and referenceable) types"). So, we generally cannot import/export any value types, only reference types.

Okay, now looking at this from a practical perspective for OCaml: why would we need to import/export types for tagged i31 anyref in the first place?

When importing something like a global, a function, or a table, I don't need to have a nominal type that wraps tagged i31 anyref, I can use tagged i31 anyref as a type directly, just like i32 and the other value types, right or wrong?

I could still import/export struct or array types that contain tagged i31 anyref, right or wrong?

But some garbage-collection implementations don't put meta-information in the heap at all, at least for common small objects, relying on the fact that the meta-information is always available in the low-bits of the pointer.

You mean, some garbage-collection implementations don't put meta-information in the bit-representation of the small object, but instead they put meta-information in the bit-representation of the pointer pointing to the small object?

we were to add i32ref and i64ref, instead of a wasting a word in the heap for each of these objects just to say that they are i32ref and i64ref respectively, an engine could make sure that the low bits of every pointer to these objects tracks whether they are i32ref or i64ref (or other). But it can't do that if tagged i31 anyref exists,

How does that work with arrays of i32ref and i64ref, or structs that contain them? I see that, for an individual ref i32ref or a ref i64ref, an engine could store in the pointer whether there is a 32-/64-bit scalar or a pointer stored in the pointed-to heap location, but for arrays or struct fields, the engine must store that information somewhere else?

To prevent these kinds of efficiencies, you want a coordinated tagging scheme for your types. That's what the SOIL Initiative's proposal does.

I understand that as "The SOIL initiative proposal aims to give the producer a way to influence the tagging scheme the engine uses." To achieve that, unavoidably, there is some added amount of complexity that the current GC MVP does not have (in particular, it adds complexity to the engine implementation while producers get to pick what they need from the spec). Is there potential for simplifying the SOIL initiative proposal, or do you think it cannot get simpler than it is?

Can you give feedback on the scheme I sketched in https://github.com/WebAssembly/gc/issues/100#issuecomment-656880383? I was confused about some things, in particular, how to represent arrays. Is cases meant as a hint for the engine to use a tagging scheme to represent the different cases?

@RossTate I believe that forcing imported/exported types to be ref types (by requiring they are deftypes) is a stopgap to avoid the whole polymorphism issue. I believe strongly that we should not preclude true parametric polymorphism in that proposal but should design for the most general case of importing types of unknown representation. That is necessitated by embeddings that need to reference and manipulate values of host types that do not fit into Wasm's type system at all.

What about making i31ref a type constructor instead of a type? We already have ref T and nullref T, why not i31ref T?

Oops, so sorry @sabine! I let this get buried.

Okay, now looking at this from a practical perspective for OCaml: why would we need to import/export types for tagged i31 anyref in the first place?

Well ideally a WebAssembly built with OCaml would be able to export its values for others to use. If, say, you built an efficient hashmap (as a toy example), then you'd like to export your hashmap type abstractly in WebAssembly (just like you would in OCaml) along with WebAssembly functions for operating on your hashmaps (just like you would in OCaml).

You mean, some garbage-collection implementations don't put meta-information in the bit-representation of the small object, but instead they put meta-information in the bit-representation of the pointer pointing to the small object?

Yep.

How does that work with arrays of i32ref and i64ref, or structs that contain them?

There's a wide variety of techniques, and which one a GC can use depends on the invariants it can maintain. Some techniques are to always use the same tagging convention throughout the system (this is what OCaml does). Another technique is to have a descriptor at the top of any struct/array informing the GC how to work with that structure. That descriptor can be encoded as bits in a variety of ways, or point to some meta information (or even a code address to jump to).

Is there potential for simplifying the SOIL initiative proposal, or do you think it cannot get simpler than it is?

Yep. We've been waiting for there to be a proper discussion and round of feedback to determine which simplifications we should explore.

Can you give feedback on the scheme I sketched in #100 (comment)?

Ah! Even more sorry! You went through all that work and I somehow missed it entirely. I'm happy to give some feedback. Also, note that we did a case study on how to do typed functional languages here. It's handling of closures is a little outdated though, since the call-tags proposal now provides a better way to deal with the complications caused by currying.

value :=      scheme.new ((field $tag (unsigned 8) immutable) castable

Instead of using $tag here, I would recommend having value declare (extensible (cases ...)) with all the various cases of reference values you have (which should each be marked as castable). The proposal will then take care of encoding these as a tag for you, whether in the pointer or in the metadata on the heap. You'll still want $tag in tuple<n> though, for distinguishing between cases of an ADT.

I was confused about some things, in particular, how to represent arrays.

For arrays, you use indexed fields. For example,

double-array block: scheme.new (parent implicit $value)
                         ((field $tag (unsigned 8) immutable)
                         (field length $array_length (unsigned 32)))
                         (field (indexed $array_length) (gcref $value))

Besides that (and insignificant syntactic things), your sketch looks good to me.

@RossTate Thanks for the feedback! It does look like things are representable in the proposal.

Instead of RTTs, there are schemes, and it is possible to test whether a gcref belongs to a certain scheme, similar to being able to test whether a reference has a certain RTT.

Well ideally a WebAssembly built with OCaml would be able to export its values for others to use. If, say, you built an efficient hashmap (as a toy example), then you'd like to export your hashmap type abstractly in WebAssembly (just like you would in OCaml) along with WebAssembly functions for operating on your hashmaps (just like you would in OCaml).

It seems nice, from a language user's (someone who writes code in a language that compiles to WebAssembly) perspective, to have the ability to create and link modules that expose opaque values and functions that operate on them.

I don't know, if the ability to express and handle these opaque types" must or should be provided on the WebAssembly level. It looks to me like it is still an open question whether WebAssembly should provide all this infrastructure, or whether languages should invent their own abstraction on top of WebAssembly for this.

The only place where I so far know about built-in opaque values being strictly necessary is in the case of WebAssembly system interfaces, where, for security reasons, handles to operating system resources must not be modifiable by a WebAssembly program.

We've been waiting for there to be a proper discussion and round of feedback to determine which simplifications we should explore.

I would guess that, if one could get many different languages to represent their heap in that model, it should be possible to quantify which of the features are not used (or so rarely used that it doesn't make sense to implement them in a MVP).

If there was an interpreter implementation of the SOIL initiative model, that would enable some experimentation. However, that's a fairly large effort, even though the performance of the interpreter is not important at all.

Another way to look at the model is to look at every piece and try to answer the question "what would be the effect on performance if this is removed".

@eqrion, you may find the opening post of this issue interesting, especially the part where @sabine says that compiling OCaml to linear memory would be preferable to having to box everything if i31 were not available.

Apart from that, most of this discussion is outdated or could be more productive in new issues now that we have folks actually working on compiling OCaml to WasmGC, so I'll close this issue.

WebAssembly / gc

Alternatives to i31ref wrt compiling parametric polymorphism on uniformly-represented values (OCaml) #100