WebAssembly / reference-types

Proposal for adding basic reference types (anyref)
https://webassembly.github.io/reference-types/
Other
162 stars 40 forks source link

Make funcref not a subtype of anyref #69

Closed RossTate closed 4 years ago

RossTate commented 4 years ago

(This idea came up after yesterday's discussion about the GC extension. I have tried to describe it here in a self-contained matter, but let me know if there are any terms I forgot to define or motivations I forgot to provide.)

Having funcref be a subtype of anyref forces the two to have the same register-level representation. Yet there are good reasons why an engine might want to represent a function reference differently than an arbitrary reference. For example, function references might always be an assembly-code pointer paired with a module-instance pointer, effectively representing the assembly code compiled from a wasm module closed over the global state of the specific instance the function reference was created from. If so, it might make sense for an engine to use a fat pointer for a function reference. But if funcref is a subtype of anyref, and if it overall makes sense for arbitrary references to be implemented with normal-width pointers, then that forces function references to be implemented with normal-width pointers as well, causing an otherwise-avoidable additional memory-indirection in every indirect function call.

Regardless of the reason, by making funcref not a subtype of anyref, we give engines the flexibility to represent these two types differently (including the option to represent them the same). Instead of subtyping, we could have a convert instruction that could take a function reference and convert it into an anyref representation, or more generally could convert between "convertible" types. The only main benefit of subtyping over conversion in a low-level type system is its behavior with respect to variance, such as co/contravariance of function types, but I see no such application for funcref and anyref. And in the worst case, we could always making funcref a subtype of anyref later if such a compelling need arises.

rossberg commented 4 years ago

To turn attention back to the question at hand, I'd summarise as follows.

If everything else was being equal, I would totally agree that cutting subtyping from this proposal would be the right move. But not everything else is equal (which is why we included it in the first place).

So the question boils down to one of cost vs benefit. And the costs of cutting it are concrete, both short-term and long-term. Inversely, the benefit is completely hypothetical, and achievable by other means.

lukewagner commented 4 years ago

@rossberg You haven't factored in "risk", viz., that we commit to some design choices now that we regret later. Until we've fully fleshed out (implemented, generated, widely understood) subtyping in more forms than funcref <: anyref, the risk is just as concrete as the costs.

RossTate commented 4 years ago

Following up on that point:

Inversely, the benefit is completely hypothetical, and achievable by other means.

To be fair, the benefits of the subtyping rules in contention are also completely hypothetical. Implicit in your evaluation has been an assumption that these subtyping rules will inevitably be adopted. I totally appreciate the philosophical principle behind this assumption, as I have been there myself. After all, every language designer aspires to a grand unifying principle for their creation.

But it is also the case that this principle inevitably clashes with performance. One of a designer's most significant and difficult decisions is how much they choose to compromise uniformity for the sake of efficiency. It would be quite surprising and amazing for WebAssembly to not face the same trade-off, given how much difference we all know something as small as a single bit can make. So, as nice as this principle is, what has surprised me is how little alternatives to top-anyref (meaning anyref is a supertype of all reference types) have been explored over the past three years so that we might understand the trade-offs inherent in this principle. It especially surprised me that this principle persisted even after the conception of Interface Types, which so emphatically embraces the notion that different modules will inevitably represent even the same kind of data differently. From what I can tell, this seems to have happened because of belief that this principle is necessary, not just one of many options, so let me briefly illustrate alternatives in places where last week assertions of this necessity seemed to be made.

One such assertion was that garbage-collected languages will need a top-anyref as an "escape hatch, e.g. for compiling uniform representations". However, in various discussions on- and offline multiple language implementers have spoken about how they specialize their representation to their needs. This seems to be especially prevalent in non-object-oriented dynamically typed languages. For example, pointer-embedded bit-flags are often used to distinguish the optimized common case (e.g. a direct reference) from the unoptimized uncommon case (e.g. a proxied reference) without requiring a read. Another example is being able to tell that values are structurally unequal without requiring a read. Given that both GC proposals are unable to verify the high-level invariants that typed languages rely upon, they too will effectively compile to code with frequent dynamic checks and casts, much like dynamically typed languages, and so there is reason to believe that they too might benefit more than usual from such techniques. Furthermore, recent research into efficient inter-language/inter-paradigm interop has been finding pointer-embedded bit-flags to be particularly useful for significantly improving performance. On 64-bit machines, it is often even possible to encode the entire type tag directly in the pointer. For WebAssembly, a module can communicate to an engine what its common cases to distinguish are so that the engine might try to encode them as pointer-embedded bit-flags, but it is impossible for an engine to employ such an optimization if everything must be a subtype of anyref. That is, not only is top-anyref not necessary for compiling uniform representations, it is likely undesirable for such languages.

Another assertion regarded type imports. In fact, there were two assertions, each worth breaking down separately. One was that the current plan avoids bias with respect to which types can be im/exported. But it most certainly is biased as it does not even consider numeric types. This is odd considering that the only value types for C/C++ programs, the primary user base for WebAssembly, are numeric types. And if we consider WASI, capabilities can easily be encoded as integer handles, and by exporting those handles as an abstract type (or as multiple abstract types for different kinds of handles), WASI can furthermore ensure that they cannot be forged (provided the above issue with call_indirect is addressed). I can imagine many applications of type im/exports that do not need and would not want references, so it seems odd to bundle these features together. Sure, to enable separate compilation of modules in browsers we might want to constrain the possibilities, but this can be done by requiring that all web wasm modules only export reference types—there is actually no need to constrain imported types. (And there are ways to even make the restriction on exports unnecessary, but that's a bigger digression I don't want to go into right now.)

But let's put the issue of numeric types aside and move on to the second assertion about type imports, which is that separate compilation requires imported (reference) types to have the same representation, i.e. top-anyref. There is certainly some truth to this, but the truth is more nuanced. For straightforward separate compilation, what is necessary is that the engine know how to manage (i.e. garbage collect or reference count) the imported reference type without knowing what the specific reference type is. But that can be done without the reference type belonging to some universal top-anyref type. To provide a very concrete example, I'll pick the Mu VM (which only works on 64-bit machines). Its tagref64 type uses NaN-boxing to pack 64-bit floats, 52-bit integers, and GC'ed references alongside a 6-bit integer into a 64-bit representation. The memory manager knows how to handle these references without interpreting those 6 bits. This means the specialized uniform representation the engine generates for each module/language can have its own 6 pointer-embedded bit-flags and yet still support separate compilation for type imports. But if there is a top-anyref type, then these bits have to go to waste because separate compilation prevents the engine from coordinating the bit-encoding across modules. So, similar to before, not only is top-anyref not necessary for separate compilation, it in fact requires sacrificing performance to enable separate compilation.

Hopefully the above demonstrates that there are alternative options and that there is reason to believe that top-anyref, like pretty much any "uniform" design, comes with a performance trade-off. I am happy to elaborate more on the above examples, or to present alternatives and trade-offs to consider for other applications of top-anyref if requested. From what I can tell, the primary contributions of the upcoming proposals are similarly independent of top-anyref. So I would discourage assuming that rolling back top-anyref (as opposed to external references) would be wasted effort. Even if we do add it later on, in the meanwhile proposals would have been designed independently, enabling simpler/embedded engines in particular to support many features of WebAssembly without needing to support top-anyref (or, in some cases, even external references). It's even possible that some of these proposals would end up being developed more quickly, as they would no longer be slowed by the many complications incurred by subtyping.

RossTate commented 4 years ago

In the last CG meeting, we talked about how it might be useful to link to relevant CG meeting notes, so here are 3 such links in case it helps:

rossberg commented 4 years ago

@lukewagner, indeed, risk is a factor to. But that goes both ways. From were I stand, the risks of making this change (such as poorly understood consequences on other proposals) is higher and more concrete than the other way round.

@RossTate:

But it is also the case that this principle inevitably clashes with performance.

I think we have repeatedly concluded that this is not the case here, because flat pointers (and other specialised features) can be introduced later. You keep making lots of assertions about what Wasm implementations could do, but little of that bears any relation to what existing implementations actually do today. So while various features could possibly reap some performance benefits in a next-gen engine, there is little reason to expect that the current engines could easily exploit them. So no benefit in making them an MVP feature.

lukewagner commented 4 years ago

@rossberg If we remove subtyping we know exactly what the consequences are; it's hard to call them "risks", they're just fixed "costs". "Risk" refers to the fact that we do not have a complete picture of subtyping at this point, and we won't until we've put "pressure" on subtyping via Type Imports and Function References.

rossberg commented 4 years ago

@lukewagner:

If we remove subtyping we know exactly what the consequences are

A bold statement. ;) I for one don't. How can we adapt the C API? What design implications will type-indexed null values, which are uncommon, have in the future? Do we get the format of the new immediates right? It's folklore wisdom that last minute design changes are a favourite source of errors and unforeseen consequences.

lukewagner commented 4 years ago

I think we should hold off on committing to a stable C API until we know more about subtyping. Until then, embeddings are already doing something now and can keep doing so. I've got to chuckle a bit at comparing the unknowns for the ref.null immediate binary format to something as cross-cutting (and hazardous in folklore wisdom) as subtyping.

RossTate commented 4 years ago

One down side to removing subtyping is that it creates the need for two instructions, funcref.null and anyref.null (or externref.null). But it occurs to me there might be a better solution. These instructions exist because types need default values (for a while). It seems likely that wasm will need to add more types over time. So rather than coming up with a new instruction each time, what if we just had a single instruction default $t : [] -> [t] that produces the default value for the defaultable type t? One nice thing is that this would work even for imported types (by the way, Type Imports has no discussion of defaultability, which is a type constraint that is not expressible with upper-bound constraints). Another nice thing is that it naturally extends to defaults t* : [] -> [t*], which would pair nicely with things like the upcoming let construct as observed in https://github.com/WebAssembly/function-references/issues/20#issuecomment-610707259. Similarly, we could have a single instruction is_default : [t] -> [i32] (no type annotation necessary), which indicates whether or not the given t value is the default value for a defaultable type t. Hopefully that would address concerns about "warts" caused by removing subtyping. But maybe y'all have already discussed this option before?

rossberg commented 4 years ago

There will be an infinite set of different nullable types and hence null values. Consequently, it would not scale to add multiple null instructions. Instead, we'd add one type-indexed null. That's still a wart, because it doesn't mesh well with subtyping on that index type. For example, although technically different, null $T and null $U will have to be considered the same value if $T <: $U, which will require identifying multiple values. From that perspective, type-indexing null is not natural in the presence of subtyping and creates artificial complication.

None of this is related to defaulting. As for the default instruction, there is some confusion here on multiple levels. First, it cannot change anything about imports. You can already emulate its behaviour via auxiliary functions that return their default-initialised locals, so it wouldn't add any new expressiveness that magically provides something new for imports. But I also don't see any problem with imports. Nullability is a property of ref types, not type definitions: a local of type ref $t can never be defaulted, you'll need to type it ref null $t (previously called optref). That's completely independent of what $t is or whether it was imported.

RossTate commented 4 years ago

Your first paragraph seems to be about formalization/specification. The formalization research community offers multiple ways to specify this pattern. The one you give is one of them, and is not considered to be particularly unnatural (it just says that values are (co)variant with respect to subtyping, the natural analog to type-level considerations). But if you don't like the idea of the same constant having multiple representations, which I understand, then you can say that null $T is only valid if $T is minimal with respect to subtyping. (You already have the requirement that $T is a "reference type".) At the moment there are only two such minimal types though, so having distinct constants funcref.null and anyref.null is not unreasonable.

As for the second paragraph, I realized I missed a step. From my earlier comment, I still had in my mind type imports rather than just reference type imports. So yeah, filling in that disconnect, it makes sense why reference type imports has no discussion of defaultability; sorry for being confusing there. But if ever wasm does add true type imports, then it will need a notion of defaultability (since types like ref $t are not defaultable), and that's a simple example of a type constraint that is not expressible via subtyping. Who knows if that'll ever happen though, so at this point it's just a thought to keep in mind.

But focusing on the here and now, do the default(s) and is_default instructions seem useful?

rossberg commented 4 years ago

How is there a minimal type in an open subtype hierarchy?

The type ref $t wouldn't even make sense if $t was not a referable type, so I don't quite follow what you are saying in the second paragraph.

IDK if there is any good use case for a default instruction. Arguably, defaulting is primarily a hack for initialisation, out of necessity, not a particularly desirable feature in general. At least I have never heard anybody asking for it.

RossTate commented 4 years ago

At least I have never heard anybody asking for it.

In investigating the rationale of the design, I have had multiple people explain that they wanted a direct way to construct and test for the default value. That seems to be the general pattern. Right now that pattern is being served by providing separate instructions for each type with a default value. But given that default values are a core feature of WebAssembly, whether just out of necessity or not, it seems another reasonable way to address that pattern would be to have general-purpose instructions for constructing and testing for default values.

RossTate commented 4 years ago

Adding link to most recent discussion for reference: April 21

binji commented 4 years ago

We had a poll at the Apr 28th meeting, with the following results (where SF represents strongly in favor of removing subtyping, and SA represents strongly against removing subtyping):

SF: 7 F: 9 N: 8 A: 2 SA: 5

At least one member of the group voted SA because there was no discussion during the meeting. We had discussed this topic at previous meetings, but none at this meeting.

Because the poll was at the end of the meeting, we weren't able to succinctly state a conclusion. I think this poll does offer some clarity, still. We should take this to mean that the group as a whole slightly favors removing subtyping, and we should proceed accordingly.

I apologize that this poll was a little haphazard. Please feel free to reach out to me to discuss any concerns you have about this decision.