WebAssembly / gc

Branch of the spec repo scoped to discussion of GC integration in WebAssembly
https://webassembly.github.io/gc/
Other
997 stars 72 forks source link

Simplifying the MVP Proposal #39

Closed RossTate closed 4 years ago

RossTate commented 6 years ago

I've been reflecting on this proposal for a while and, after seeing the recent pull reques #38, have come to the conclusion that it strikes an unhappy middle ground between two goals. One goal is to enable an integrated garbage collector, in particular enabling the browser to place and manage wasm objects in the same space as DOM and JS objects, simultaneously relieving wasm programs of their own memory management and making well-behaved wasm-DOM/JS interop possible. The other goal is to enable the browser to statically verify wasm code and thereby reduce the overhead of dynamic checks.

Reflecting upon these goals, the proposed type system is way more expressive than is necessary for garbage collection, and yet it is also far too inexpressive to statically verify very common and important patterns like dynamic method dispatch and closure invocation. And to make matters worse, the run-time type system doesn't seem particularly well suited to either of these tasks, or at least it doesn't match up well with existing systems. And regular-coinductive types aren't even that easy to implement.

Of the two goals, the first goal (integrated garbage collection) is both more pressing and easier to achieve. I think the MVP could be dramatically simplified by focusing on just that goal. Interestingly, it seems that doing so also leaves a lot more flexibility for achieving the second goal. The basic idea is to admit the fact that wasm will be used by a wide variety of languages and programs with a wide variety of needs, many of which are hard to foresee and plan for right now, and the garbage collector needs to be designed for heterogeneity. Note that that doesn't mean deciding all the possibilities now - rather it means the opposite.

To get a sense of what I mean, let me propose my first "type": gcref. A gcref is a pointer to a chunk of memory that, in that chunk of memory at a standard offset chosen by the garbage collector, provides the garbage collector with the information it needs in order to know how to manage that chunk of memory. This is essentially what the current proposal calls anyref, but isn't such a misnomer. There are references that don't satisfy the requirements. For example, although interior pointers point to a chunk of memory that provides the garbage collector with the necessary information, that necessary information is not at the standard offset. Although the MVP proposal is unlikely to have the expressiveness to handle efficient interior pointers, future proposals that better achieve the second aforementioned goal might be able to, and so its better to design with their existence in mind even if they don't exist yet.

So the key idea of gcref is that a gcref dynamically provides the garbage collector with the information it needs to manage the reference. There can be many ways to manage a reference, and the garbage collector can be responsible for choosing how to encode this dynamic information. So now we need to think of a few common base cases that the browser should provide at the least to provide critical functionality - we can add more cases in future proposals as it becomes clear they would enable more paradigms or better efficiency.

The management strategy that comes to mind as the simplest, and reasonably efficient, is the dynamic strategy. With the dynamic strategy, the meta-information associated with a block of memory simply says how big the block of memory is. The garbage collector is then responsible for walking through that block of memory and determining which of its fields are references, and then subsequently following those references and managing those. This is essentially the strategy that OCaml uses. Some issues do arise though, and so in fact the meta-information needs to answer a few questions:

Each combination of answers for these questions specifies an "allocation type". An allocation type is like a run-time type, but whereas run-time types sometimes have a bunch of semantic information (such as in nominally typed languages), allocation types just specify memory information. After pondering a variety of languages, I think that if wasm were to ensure that programs could dynamically allocate structures with at least these allocation types, then most languages would be able to have a reasonably performant and simply wasm backend. Note, though, that I say this assuming there are more allocation types; it just might not be the case that wasm programs need to be able to dynamically allocate them. For example, functions would likely have their own allocation types, but what those are do not have to be fixed at this point. Also, JS and DOM objects would also have their own allocation types, but wasm programs would not be able to allocate them since difference browsers are going to have different memory layouts for JS and DOM objects.

Lastly, we still need to consider the second goal: statically verifying wasm programs. For this we'll use "register types". Register types both indicate what kind of operations can be performed on a given register (or really stack slot) and how the garbage collector needs to manage the content of that register. For example, the register type gcref can be cast to certain other register types and indicates the garbage collector needs to manage the content of that register. The register type ngcref indicates it might be null (optional). There are register types for various primitives like ints and floats that indicate various arithmetic can be performed on them and that the garbage collector should ignore their contents even if they might look like a reference. The register type ngcrefprim indicates a value that might be null, or a reference, or a primitive packed into a reference-width format, with a tag indicating which case it is, which indicates that various unpackings can be performed and lets the garbage collector know to check the tag before trying to manage the reference.

Then there would be register types indicating what kind of memory operations can be performed on the value. For example, one register type would indicate that its a reference that points to a block with at least 2 fields of a tagged mixed references and non-references. Another register type would indicate that its a reference that points to an array of immutable non-references with a header of exactly 2 immutable fields of tagged mixed references and non-references. Note that the latter register type is a subtype of the former register type (which is important for OO languages in which all objects have a header specifying its vtable and a unique object identifier).

I realize I've handwaved a lot here, but I do actually have a lot of the details worked out. Nonetheless, I thought I'd lay the high-level ideas out here first before diving into details.

fgmccabe commented 6 years ago

On Oct 12, 2018, at 8:01 AM, Ross Tate notifications@github.com wrote:

I've been reflecting on this proposal for a while and, after seeing the recent pull reques #38 https://github.com/WebAssembly/gc/pull/38, have come to the conclusion that it strikes an unhappy middle ground between two goals. One goal is to enable an integrated garbage collector, in particular enabling the browser to place and manage wasm objects in the same space as DOM and JS objects, simultaneously relieving wasm programs of their own memory management and making well-behaved wasm-DOM/JS interop possible. The other goal is to enable the browser to statically verify wasm code and thereby reduce the overhead of dynamic checks.

Personally, it is not at all obvious that having an integrated garbage collector is an appropriate goal for wasm. That depends entirely on what the ultimate role of wasm in the ecosystem ends up being. Some counter arguments: a. Not all languages need a garbage collector (e.g. C/C++) b. some languages have their own idea of what gc means (e.g. Swift) c. Some language communities have invested very heavily is scalable and it does not make senses to try to compete with them (e.g. Java & Haskell) d. In order for GC to make sense, you need to incorporate an idea of object into Wasm. If this gets adopted as a goal, then get ready for some uncomfortable trade-offs (most languages’ idea of object looks nothing like that of Javascript’s idea) e. If managed memory is separate from the byte array then that will actually complicate most language implementations.

On the other hand … there are definitely some benefits: a. Having a common model for GC represents a point of contact for GC experts to contribute without being overly constrained by the peculiarities of a given PL. b. Having a common object model will help with language interoperability c. It will reduce the burden of implementers bringing a new language to the world of the Web/Wasm.

Reflecting upon these goals, the proposed type system is way more expressive than is necessary for garbage collection, and yet it is also far too inexpressive to statically verify very common and important patterns like dynamic method dispatch and closure invocation. And to make matters worse, the run-time type system doesn't seem particularly well suited to either of these tasks, or at least it doesn't match up well with existing systems. And regular-coinductive types aren't even that easy to implement.

Of the two goals, the first goal (integrated garbage collection) is both more pressing and easier to achieve. I think the MVP could be dramatically simplified by focusing on just that goal. Interestingly, it seems that doing so also leaves a lot more flexibility for achieving the second goal. The basic idea is to admit the fact that wasm will be used by a wide variety of languages and programs with a wide variety of needs, many of which are hard to foresee and plan for right now, and the garbage collector needs to be designed for heterogeneity. Note that that doesn't mean deciding all the possibilities now - rather it means the opposite.

To get a sense of what I mean, let me propose my first "type": gcref. A gcref is a pointer to a chunk of memory that, in that chunk of memory at a standard offset chosen by the garbage collector, provides the garbage collector with the information it needs in order to know how to manage that chunk of memory. This is essentially what the current proposal calls anyref, but isn't such a misnomer. There are references that don't satisfy the requirements. For example, although interior pointers point to a chunk of memory that provides the garbage collector with the necessary information, that necessary information is not at the standard offset. Although the MVP proposal is unlikely to have the expressiveness to handle efficient interior pointers, future proposals that better achieve the second aforementioned goal might be able to, and so its better to design with their existence in mind even if they don't exist yet.

So the key idea of gcref is that a gcref dynamically provides the garbage collector with the information it needs to manage the reference. There can be many ways to manage a reference, and the garbage collector can be responsible for choosing how to encode this dynamic information. So now we need to think of a few common base cases that the browser should provide at the least to provide critical functionality - we can add more cases in future proposals as it becomes clear they would enable more paradigms or better efficiency.

The management strategy that comes to mind as the simplest, and reasonably efficient, is the dynamic strategy. With the dynamic strategy, the meta-information associated with a block of memory simply says how big the block of memory is. The garbage collector is then responsible for walking through that block of memory and determining which of its fields are references, and then subsequently following those references and managing those. This is essentially the strategy that OCaml uses. Some issues do arise though, and so in fact the meta-information needs to answer a few questions:

Is this block entirely full of references, entirely full of non-references, or a tagged mix? Is this block an array, in which case how big is the header (and is the header full of refs, non-refs, or mixed), and how many indices does it have (and are they refs, non-refs, or mixed)? Is this block entirely immutable? (Or, if its an array, is the header entirely immutable, and are the indices entirely immutable?) Different languages will have very different approaches to this. Some will be more-or-less completely dynamic; others can determine all of this information at compile time.

Each combination of answers for these questions specifies an "allocation type". An allocation type is like a run-time type, but whereas run-time types sometimes have a bunch of semantic information (such as in nominally typed languages), allocation types just specify memory information. After pondering a variety of languages, I think that if wasm were to ensure that programs could dynamically allocate structures with at least these allocation types, then most languages would be able to have a reasonably performant and simply wasm backend. Note, though, that I say this assuming there are more allocation types; it just might not be the case that wasm programs need to be able to dynamically allocate them. For example, functions would likely have their own allocation types, but what those are do not have to be fixed at this point. Also, JS and DOM objects would also have their own allocation types, but wasm programs would not be able to allocate them since difference browsers are going to have different memory layouts for JS and DOM objects.

Lastly, we still need to consider the second goal: statically verifying wasm programs. For this we'll use "register types". Register types both indicate what kind of operations can be performed on a given register (or really stack slot) and how the garbage collector needs to manage the content of that register. For example, the register type gcref can be cast to certain other register types and indicates the garbage collector needs to manage the content of that register. The register type ngcref indicates it might be null (optional). There are register types for various primitives like ints and floats that indicate various arithmetic can be performed on them and that the garbage collector should ignore their contents even if they might look like a reference. The register type ngcrefprim indicates a value that might be null, or a reference, or a primitive packed into a reference-width format, with a tag indicating which case it is, which indicates that various unpackings can be performed and lets the garbage collector know to check the tag before trying to manage the reference

Again, this presumes knowledge and constraints on language design. IMO, it unnecessarily raises the importance of ‘primitive’ types and ‘primitive’ operations. It is quite possible to have an efficiently compiled language that has essentially no primitive types other than bit-string.

Then there would be register types indicating what kind of memory operations can be performed on the value. For example, one register type would indicate that its a reference that points to a block with at least 2 fields of a tagged mixed references and non-references. Another register type would indicate that its a reference that points to an array of immutable non-references with a header of exactly 2 immutable fields of tagged mixed references and non-references. Note that the latter register type is a subtype of the former register type (which is important for OO languages in which all objects have a header specifying its vtable and a unique object identifier).

I realize I've handwaved a lot here, but I do actually have a lot of the details worked out. Nonetheless, I thought I'd lay the high-level ideas out here first before diving into details.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/gc/issues/39, or mute the thread https://github.com/notifications/unsubscribe-auth/ACAL0FWcMR0P2VJOz_MxPlGaxJ_95RL1ks5ukK7BgaJpZM4XZpU5.

rossberg commented 6 years ago

@RossTate, to clarify the assumptions underlying this proposal: the representation information for the GC is indeed assumed to be stored in the heap objects, like you suggest. That is the standard approach, and we don't require anything unusual. So I think we are in sync on this one.

However, to construct that information, the runtime has to use the struct/array type definitions. Are you suggesting that there shouldn't be such definitions? If so, where would the information be taken from?

The other half of your suggestion is a bit confusing to me. The notion of register type you are describing sounds exactly like what Wasm's value types already are. The Wasm type system has no other purpose than determining what kind of data representation occurs where. And this proposal extends it along the same lines you seem to be suggesting. Can you elaborate what differences you see?

RossTate commented 6 years ago

@rossberg, good questions, let me attempt to clarify (out of order).

The other half of your suggestion is a bit confusing to me. The notion of register type you are describing sounds exactly like what Wasm's value types already are. The Wasm type system has no other purpose than determining what kind of data representation occurs where. And this proposal extends it along the same lines you seem to be suggesting. Can you elaborate what differences you see?

What I'm calling register types is what you're calling value types. I just wanted to make it clear that there are types that are given to registers that aren't necessarily given to values in the heap, and vice versa. For example, in what I'm describing, one never says that some field in the heap has type f64. The garbage collector doesn't need that amount of information. The garbage collector just needs to know that its 8 bytes of non-reference data. There are uses to having the more detailed information specified at allocation time, but it seems that it's not necessary for the MVP.

However, to construct that information, the runtime has to use the struct/array type definitions. Are you suggesting that there shouldn't be such definitions? If so, where would the information be taken from?

I'm suggesting the suggestions should be much simpler. For example, there could be just one memory-allocation operator, say new_data, with the following arguments:

That's it. No type argument indicating what the various fields are and such.

rossberg commented 6 years ago

@RossTate, it can actually be relevant on some platforms whether a value on the heap is e.g. an f64 or an i64. For example, that might induce different alignment requirements or preferences.

For references, you need to know what they are pointing to, otherwise every single access would require a runtime check. Are you proposing that?

If the data model required segregating the different types of fields in an aggregate then producers would have to transpose or unzip some data structures (e.g., turning arrays of structs into structs of arrays), which means that they may not be able to maintain desired memory locality properties, and some access patterns become non-compositional (e.g. copying out a struct from an array at a given index).

Taking all that into account, practically all the representation type information that this proposal provides is potentially relevant. We basically adopt the C data model. I'm afraid that I don't see that there is much you can take away without losing any of these abilities (anticipating post MVP extensions such as nested aggregates).

lukewagner commented 6 years ago

It also seems to me that the current proposed design isn't that different from what is proposed in this issue once the following constraints are taken into account:

One can imagine, and indeed we've lightly discussed in the past, a new field type that is basically an untyped sequence of bytes. Since this byte-array-field has to exist within a bigger object (with padding potentially inserted before/after if it is preceded/followed by other fields), it would need new get/set_field ops that, like the linear memory load/store ops, describe the src/dst value type (and bitwidth and sign extension) in the op immediates. But since we at least need the current get/set_field design (which respects the value type of the field, viz., for reference types), the more flexible byte-array ops seem like a post-MVP feature which is why they're currently not mentioned.

RossTate commented 6 years ago

If the data model required segregating the different types of fields in an aggregate then producers would have to transpose or unzip some data structures (e.g., turning arrays of structs into structs of arrays), which means that they may not be able to maintain desired memory locality properties, and some access patterns become non-compositional (e.g. copying out a struct from an array at a given index).

This is an example of something that isn't necessary but can be done faster through more specialized mechanisms. That doesn't sound like an MVP item.

the producer can't know what padding to insert due to unknown reference bitwidths and target hardware alignment requirements (so it has to be insertable by the wasm consumer at compile/load-time)

If you tell people they can only load 32-bit values from 4-byte aligned indices in these unmanaged blocks/arrays and similarly 64-bit values from 8-byte aligned indices, then the language compiler can take care of padding and aligning things and the browser needs no understanding of that alignment. Sure, there might be a better way to align and pad things for the specific hardware at hand, but it's not necessary. It just makes some things faster/smaller. So it doesn't seem like an MVP item.

references can't alias user-controlled bits (so reftypes have to have certain hard invariants)

Due to interoperability, if you want to ensure references keep some number of bits unused, you probably need that to be a global invariant of the system. This means all gcrefs should keep certain bits free. I assumed at least one in the above.

For references, you need to know what they are pointing to, otherwise every single access would require a runtime check. Are you proposing that?

not wanting dynamic type checks at every get/set-field (so some sort of transitive reference type)

Suppose you're given an ngcref that you expect to be a chunk with at least 8 reference fields. You do a cast, one that is extremely simple and efficient: the gc checks that the upper byte or two of the tag has the correct bits and that the lower byte or two of the tag is smaller than 8. From then on every dereference of a field at index less than 8 requires no checks. Those dereferences give you an ngcref, which you'll have to cast again if you expect it to have a particular structure. It'll be a lot of checks, but they'll also be extremely fast. So it's a simple and flexible system with pretty reasonable performance, which is what I'd expect of an MVP.

I gotta run. If what I just said doesn't address the issue you raised, then I'm probably misunderstanding a concern, and it'd help to have a concrete example of something that could go wrong.

RossTate commented 6 years ago

I thought of an interesting potential compromise. My primary concern is keeping the allocation types simple. Each allocation type adds complexity to the garbage collector, and that complexity can cause both space and performance issues. Looking back, even my proposal has more allocation types than necessary: you don't really need immutability (yet), and you don't really need reference-only chunks (yet).

So what if we weaken the grammar and guarantees of the structural type system so that, if the browser chooses to, it can be implemented using a fairly simple grammar of allocation types? In particular, what if we make the semantics of cast be:

So you could allocate a chunk with the type (struct i32 i64 i32), and you'd later be guaranteed to be able to cast that chunk to (struct i32 i64). But on some browsers on some machines you might also be able to cast that chunk to (struct i64 i64). You couldn't plan on that happening, but it might work out by accident, and if it did the system would still ensure that nothing would go wrong (safety-wise) as a consequence. This would happen in a garbage collector that happens to simplify all structs of non-references to non-reference-structs of some number of bytes, and that happens to be running on a machine where i64s do not have to be 8-byte aligned.

So that brings us to what things casts are able to guarantee, which boils down to the grammar of the target type. For the MVP, I think the grammar should be something roughly equivalent to the following:

Regardless, I think the allocation types need to be kept simple. Otherwise it seems you're placing a huge burden on the garbage collector, and one that can't be undone.

rossberg commented 6 years ago

This is an example of something that isn't necessary but can be done faster through more specialized mechanisms.

Fair enough, and in fact it (nested aggregates) is left out from the MVP. However, the data model for the MVP needs to have a clear extension path to it.

If you tell people they can only load 32-bit values from 4-byte aligned indices in these unmanaged blocks/arrays and similarly 64-bit values from 8-byte aligned indices, then the language compiler can take care of padding and aligning things and the browser needs no understanding of that alignment. Sure, there might be a better way to align and pad things for the specific hardware at hand, but it's not necessary. It just makes some things faster/smaller. So it doesn't seem like an MVP item.

But what is the benefit? Not distinguishing floats would be saving two primitive storage types for the price of more complexity elsewhere (you'd instead need multiple new instructions for distinguishing types at use sites), and design redundancy when actually introducing the refinement later.

references can't alias user-controlled bits (so reftypes have to have certain hard invariants)

Due to interoperability, if you want to ensure references keep some number of bits unused, you probably need that to be a global invariant of the system. This means all gcrefs should keep certain bits free. I assumed at least one in the above.

Luke meant something else: no address bit at all must be observable for references. Reference values must remain completely opaque to Wasm code. Everything else would be a severe security risk.

Suppose you're given an ngcref that you expect to be a chunk with at least 8 reference fields. You do a cast, one that is extremely simple and efficient: the gc checks that the upper byte or two of the tag has the correct bits and that the lower byte or two of the tag is smaller than 8.

Small cost is not good enough. Access to memory itself shall have zero extra cost (over the bare hardware). The only way to achieve that while maintaining memory safety is sufficient static knowledge. (This is different from OCaml, for example, which can afford an unsafe data model, because its entirely internal. We can not, because it is exposed, plus we generally run untrusted code.)

From then on every dereference of a field at index less than 8 requires no checks.

Only if you assume some additional static analysis in the engine that goes beyond the type system you sketched. That analysis would mirror pretty much exactly the propagation of type information that the proposal makes explicit. Remember that one of the design goals of Wasm is to minimise engine magic, maximise predictability, and favour explicit code.

But on some browsers on some machines you might also be able to cast that chunk to (struct i64 i64). You couldn't plan on that happening, but it might work out by accident, and if it did the system would still ensure that nothing would go wrong (safety-wise) as a consequence.

Oh, are you proposing undefined or implementation-dependent behaviour? That is a no-go for Wasm. And there is no point to any feature that producers cannot rely on.

Quite honestly, besides the issue of UB, I don't understand what simplification the proposal in your most recent comment achieves. The simplified data type grammar you show is no simpler than the one in the MVP proposal -- it's more complex. Can you elaborate?

RossTate commented 6 years ago

Okay, if you don't want to allow implementation-dependent behavior, that's fine and it means the compromise doesn't work. Note that disallowing even just endianness to be potential exposed very severely restricts your opportunities and significantly complicates (and likely slows down) the runtime. I'm not sure why you're so against that, but I'll run with it.

But your comments about the original proposal indicate some clear misconceptions of what I'm proposing.

Luke meant something else: no address bit at all must be observable for references. Reference values must remain completely opaque to Wasm code. Everything else would be a severe security risk.

Nothing in my proposal allows this. If someone can point out why they think it is possible, that'd be a useful example we can work through to clarify things.

From then on every dereference of a field at index less than 8 requires no checks.

Only if you assume some additional static analysis in the engine that goes beyond the type system you sketched.

The type system I sketched (or at least intended to sketch) lets a register type indicate that a reference has at least 8 ngcrefprim fields. There's no type inference necessary. My example used a cast just like you do.

Small cost is not good enough. Access to memory itself shall have zero extra cost (over the bare hardware).

The costs of my casts are smaller than yours. Mine are constant time with only one memory lookup (to get the gc tag on the object) and no mutable state. Yours can only be hoped to be amortized constant time, but even then with multiple lookups and mutable state.

Quite honestly, besides the issue of UB, I don't understand what simplification the proposal in your most recent comment achieves. The simplified data type grammar you show is no simpler than the one in the MVP proposal -- it's more complex. Can you elaborate?

The grammar of my original proposal has, in total, 42 cases. The grammar of my proposal after the two simplifications I made above has 6 cases. The grammar of your proposal has an infinite number of cases and requires finite state machines to express the values of. (Here I'm talking about the grammar of allocation types, which again is my primary concern.)

I also know of major garbage collectors that are implemented using my grammar(s) or at least grammars very similar to mine (e.g. OCaml and Haskell). I don't know of any garbage collectors that are implemented using your grammar. (I don't think the JVM or CLR ones do, since I think those rely on nominality to address the complexity of their datastructures, but I could be wrong.) Do you?

titzer commented 6 years ago

Wasm does not have polymorphism over representations of values yet, and I don't think that a GC MVP should introduce that unnecessarily. Specifically, @RossTate introduced a proposal for a type of mixed struct that allows values of type prim-type|ref-type. We've thought this through pretty far already, but little has probably been recorded about that.

Let's call prim-type|ref-type the utype for the moment (it's almost anytype, but let's not get too wild just yet).

The problem is that primitive types have several different representations (4 already: with i32, i64, f32, and f64, and a 5th if you include the coming SIMD value, a 128-bit type). The representation of utype alone will oblige the engine to do "engine magic" that will vary significantly across implementations. In particular, 3 JSVMs use NaN-boxing for JSValues, while 1 still uses SMIs (31-bit intref). Of course, I don't want to reason through to a solution based solely on what JS engines do, but they are at least enough to illustrate why I think value polymorphism is not the right step at this stage. JSVMs are going to try to fit utype into their existing value representations, but it will come with unwelcome performance cliffs, and because of that fact, would diverge from one of Wasm's primary goals of predictable performance.

In any case, even a cleanroom engine implementation is going to face the following problems:

(1). An f64 or an i64 subsuming to a utype may need to be boxed, an unwelcome implicit allocation on the heap. Ross gave a couple of suggestions for avoiding that allocation (e.g. splitting into multiple 31 bit values, etc). The problem is that will cause problems for concurrency in the future, and still cannot represent the full 64 bit range of doubles (if we continue to allow WASM programs to store information in NaN bits) or the full 64 bit range of i64. Another approach would be to move the "tag" of this type information into the meta-data of the struct. To really support what Ross suggested, this would require mutable meta-data, which again is going to be a pain for concurrency.

(2). utype values not stored into the heap would have the same problem as above, except there is no where to store additional metadata, except in the value itself. Perhaps this is why Ross suggested restricting these types of values to not be locals (if I understood correctly). IMO bifurcating types in this way will constantly lead to loss of compositionality (e.g. how do you manually copy a mixed struct if you can't read the values out of its fields without switching? (see (3) below)). In the end, all value types must be real, first class values that can go anywhere. (thus the name :-))

(3). If we can't do (2) then reading a field from a mixed struct implicitly means converting a utype stored in that field into one of the known representation types. E.g. I surmise this would mean we would need to encode the expected representation into the load instruction itself, and its semantics would include an implicit type check, which would trap. That's again an additional cost which goes against the predictable performance goal.

Overall I think mixed structs and anything like utype is a non-starter for Wasm. That is partly because of the above specific technical reasons, but also recognizing the trajectory of that story. What seems to keep happening is that first a dynamic mechanism is introduced because it's the most accessible solution and then we get stuck with the headache of optimizing it while we try to figure out the statically typed version after the fact. There are echoes of that trajectory in lots of other languages and IMO that's caused lots of trouble. We should do it the right way around this time and design the minimal statically typed thing first before supporting dynamic types as an add-on through a universal type in WASM GC 3.0.

I agree with @lukewagner who replied previously that being able to execute loads and stores without type checks is an important property. IMO it is a "must have" for the MVP. It's a low bar, but it enables the efficient object representations necessary for making static languages like Java run well. Otherwise those languages start with a pretty bad penalty before they even get to implementing method dispatch.

Which brings me to method dispatch. As far as what Ross mentioned early on about not being able to typecheck virtual dispatches, actually, most of the virtual dispatch sequence does typecheck properly. In particular, with immutable fields, the load from the object's header (to get the vtable), the load from the vtable to get the function pointer reference, and the call to the actual function pointer all statically typecheck on the caller's side. In fact the machine instructions would be no different than what a JVM would generate. The only operation that does not statically typecheck is the receiver object in the target function, which requires a dynamic cast (probably best put in the target function itself). I find this to be a reasonable to tradeoff if we can make this check in the target function very cheap (e.g. a single pointer compare). I believe this because many papers have shown that static devirtualization works very well, even for Java, if the scope of optimization is large enough (e.g. the module granularity). Off the top of my head, something like 90% of all virtual dispatches can be statically devirtualized (without inlining!). With inlining, it's even more. I expect that targeting Wasm offers most language implementations an opportunity to construct large modules and do devirtualization inside of them. In brief, virtual dispatch should be uncommon if toolchains are smarter. (and of course, nothing prevents engines from going all in on dynamic profile-guided optimizations).

RossTate commented 6 years ago

Great points, @titzer. I have some thoughts, but first I want to spend time talking more about the problem than the solution. The problem has two parts.

For a memory-managed language, most chunks of memory are going to have a header word (or two) that will be used for at least garbage-collection information. In particular, that information needs to somehow inform the garbage collector how large the object is (in order to clean it up) and which bits in that object do and do not denote references (so that it can perform its reachability scan). This header is very valuable, and you can do a lot of clever stuff if you're careful with it. But this proposal basically forces the GC's hand to make the header a reference to other metainformation that then will provide the GC with the info it needs. And whereas it's easy to add more cases to what this header has to describe, it's very hard to remove cases. So I'm concerned by the fact that this proposal jumps straight for requiring a lot of information as its first step.

The second problem is the casts. Casts in this proposal are not cheap. A regular-corecursive structural subtype check is often quadratic time (assuming the types have already been translated into DFAs). This could be mitigated by using the many tricks we have from implementing dynamically typed languages, such as memoizing and inline caches. But even then, they shouldn't be taken lightly. It is quite possible that one of these casts could regularly take longer than multiple of my casts even after bloating the code and memory space to make the necessary optimizations happen.

As a metapoint, the current MVP's GC-header and casting strategy both rely heavily on things like caching behavior. I'm hesitant to rely on the fact that this caching behavior has worked out well enough in prior language implementations because there are some key differences. For example, in prior language implementations like C#/Java that use the reference-GC-header approach, things like the vtable and itable and various other metainformation were not part of the same GC system - the GC handles these items separately. So the caching behavior of such systems is not going to be the same as for those same systems implemented in this MVP.

This discussion also illustrates how unpredictable the performance of wasm programs will be with this MVP. While some unpredictability is inevitable due to differences in things like 32-bit vs 64-bit, this MVP will make that unpredictability all the more extreme. For example, having an inline cache with 3 versus 4 entries can dramatically impact the performance of some programs. So the MVP seems to be permanently throwing wasm into heuristic hell. And I still don't think it even does a great job at its goal. If you're going to force the reference-GC-header approach, then why isn't the reference also pointing to the vtable and other information shared across multiple values of the same type, like it is in every other implementation I know of using this approach? What about interfaces, which the MVP doesn't seem to have a good story for (both in terms of time efficiency of interface-method dispatch and in terms of space (and caching) efficiency of the necessary tables that would normally be packed into the vtable but can't be with this proposal), despite the fact that interface usage has long been increasing over the years?

In a sense, I feel that anything we're gonna be able to design in short time will become outdated, possibly fairly soon, and so I'd rather design something whose backwards compatibility doesn't impose a major burden, even if it means that that something is a little less efficient for the short time that we're stuck with it. (To that end, I'm happy throw out what Ben refers to as utype - I was mostly just throwing that in there because I saw a lot of discussion of 31-bit integers and references. There are other ways to enable mixing references and non-references.)

rossberg commented 6 years ago

@RossTate, ah, note that the design of casts has changed significantly in PR #38, based on discussion on the earlier PR #34, and also concerns you had raised earlier. RTT information is now optional, explicit and based on a simple nominal hierarchy. So all checks are cheap and the requirements on the runtime system moderate. It remains to be seen how usable that is when compiling more structural languages.

It is true that the runtime will need a description of each object's layout. Object descriptors are a known technique in the GC world, even though they're more involved than a simple header tag. Existing JS engines use far more complicated object descriptors than what is needed for this proposal. It's also true, however, that this proposal adds a further twist to that. SpiderMonkey has already implemented that as part of the JS TypedObjects proposal, and other JS engines will likely have to follow. Perhaps @lukewagner can comment on the complexity. Other examples of languages with data models similar to this proposal include Go.

Given that Wasm GC needs to be more universal than any language-specific GC it shouldn't be surprising that it may involve slightly more work to implement. I don't think that is unreasonable as long as it is sufficiently canonical.

Nothing in my proposal allows this. If someone can point out why they think it is possible, that'd be a useful example we can work through to clarify things.

Okay, sorry, I probably have misinterpreted your earlier reply on the subject of aliasing reference bits.

RossTate commented 6 years ago

Ohhh, and now the rtt and the mention of nominality in the PR both finally click (both in my head and with each other, heheh). I think I both see why I was confused and why y'all wouldn't expect it to be confusing. Anyways, let me pause this conversation. There are a lot of points that everyone has made that I want to synthesize.

RossTate commented 6 years ago

Okay, I've got a new bunch of ideas, but I'm hoping the conversation will be more productive (although I did find it quite useful for coming up with what follows) now that I'm on the same page as y'all (I think). To that end, I'm assuming y'all are planning on using the constant-time linear-inheritance nominal casting algorithm I told y'all about way back. If not, (or if you have questions about that,) let me know. And for everything below, throw away the above proposals - I'll borrow some ideas, but it'll be easier to start with a clean slate.

First of all, it seems likely you want to have three kinds of gcref (I still believe you should call them gcref, or better yet castref, rather than anyref). The first kind can be cast to a (nominal) type using the linear-hierarchy strategy you're already planning on. The second kind can be cast to a (nominal) type using a flat-hierarchy strategy; this strategy can be encoded using linear hierarchies but has a more space efficient implementation that might be important for certain features/languages (e.g. sum/case types as well as interface tables). The third kind cannot be cast - or alternatively can be cast but the cast will always fail.

This third kind is important for addressing a lot of the concerns I was expressing above. You see, if a gcref doesn't need to be cast, then all the header needs to know is how to scan and clean up the chunk of memory. For many memory layouts, this information can be expressed with bitmaps and such within the header itself, not requiring any reference in the header. This is useful for the many cases where the code always has some other way to know the appropriate type of the reference. For example, with vtables the type of the vtable can be determined by casting the object with that vtable rather than casting the vtable itself. Similarly, the interface table pointed to by that vtable (rather than nested within, since we can't do interior pointers yet), will never itself be cast (although the interface-method tables within it will be). In general, often the behind-the-scenes data structures of a language implementation will have their type known and will not need to be cast, and these were the ones I was worried about thrashing the gc cache in unanticipated ways.

My sons just woke up from their naps, so it looks like I'll have to defer more thoughts until later. But maybe that's better so that we're only discussing one thing at a time. Let me know what you think.

titzer commented 6 years ago

I didn't quite understand your example with interfaces. A parallel thought I had was that Java would implement interface dispatch similar to how Haskell implements typeclasses: dictionary passing. A reference to an object of an interface type would become a tuple of the receiver and dictionary (i.e. two values), and subsumption from a class type to an interface type would entail an interface lookup. The dictionary would just be an immutable struct of function pointers, so interface dispatch (once one has the dictionary) is a single indirection.

In general interface lookup is a pretty tricky problem. Another way to go is just to have a list of implemented interfaces having off the metaobject and do a (hashed) search. To make this typecheck, this would also imply a downcast to the dictionary type.

RossTate commented 6 years ago

The two most common ways of implementing interface lookups are interface tables (itables) and interface-method tables (imtables). Imtables are newer and were a big part of getting interfaces to perform better on the JVM and consequently be more widely adopted. Imtables also cannot be implemented with dynamic casting because they rely on simple-but-clever stack tricks; these tricks are implementable with wasm's instruction set, but it takes a fancier static type system to verify them. I'm working on such a type system for wasm, but it's a longer-term project.

So that leaves us with itables, which is essentially your list-of-implemented-interfaces idea. My understanding, though, is that it's better to just implement them as an array list in order to get better cache performance since it's usually a rather small list. Also, usually this list is simply the tail of the vtable data object. Furthermore, it usually indexes directly into the vtable to save space and reduce fragmentation. These last two points is where the current MVP particularly falls short. It forces the itable and each of the interface-method tables listed in the itable to be separately allocated memory chunks. Furthermore, by uniformly applying the same gc model to everything, it likely forces each of these chunks to be larger than they should need to be. And before one discounts all this waste because it's only for metadata, I know the C# team found that it was important to keep this metadata small and compact, even going so far as to make virtual-method dispatch use even more address indirection in order to achieve this goal.

The dictionary-passing technique you mention causes a huge waste of memory. For example, every field of type List in Java would need to be 2 references rather than 1: the reference to the data, and the reference to the interface-method table for that data. It also causes big problems with parametric polymorphism combined with variance as it would require implementations of generic methods of interfaces to dynamically convert the class instance's internal data to the callee's expected type for that data, which would be especially problematic for C#.

RossTate commented 4 years ago

It's interesting rereading this conversation after so many years and seeing how the various ideas that come up here have propagated. Nonetheless, more mature manifestations of those ideas can now be found elsewhere, whether in the current proposal, other issues, or other proposals, making this issue outdated. Thanks to everyone for the great discussion!