WebAssembly / shared-everything-threads

A draft proposal for spawning threads in WebAssembly
Other
34 stars 1 forks source link

Unreachable code and `shared` typing #80

Open abrown opened 4 weeks ago

abrown commented 4 weeks ago

While implementing validation logic for instructions like any.convert_extern, I ran into a question: if we're validating unreachable code, we may not have a type available for the incoming operand. For the result type of that instruction, the nullable bit is set to false — what should the shared bit be set to?

tlively commented 4 weeks ago

Good question. @rossberg, this seems to require type variables for the shareability. Do our shared-polymorphic instructions like any.convert_extern and ref.as_non_null violate closed forward principle types?

conrad-watt commented 4 weeks ago

@abrown does your typing algorithm explicitly represent bot? (ref). My understanding is that the addition of bot removed the need to "guess" nullability of types in unreachable code, and the same should be true for shareability.

IIUC there should be two cases: 1) we're either in unreachable code and popping from the base of the stack, or alternatively popping an explicit bot produced by select etc. In this case the result of any.convert_extern is just bot. 2) we're popping a type with known nullability and shareability, in which case this is preserved by any.convert_extern.

conrad-watt commented 4 weeks ago

Oh, looking more closely at the current typing rule for any.convert_extern, my explanation above doesn't seem to be true. But I think it should be! @rossberg what's your understanding? I think the other interpretation is that we need explicit "shared` variants of the two instructions.

tlively commented 4 weeks ago

No instruction currently results in bot when its input is bot. For example, (unreachable) (i32.eqz) (i64.add) is not valid. The reason you don't need to guess an output nullability for nullable-polymorphic instructions whose input is bot is that it's always valid to guess "non-nullable," since that is a subtype of "nullable" and is allowed in strictly more places. We do not have such a subtyping relationship between shared and unshared, so there's no conservative guess to make.

rossberg commented 4 weeks ago

[Short reply from vacation.]

Yes, interestingly, it seems like this is okay for Principal Types, but breaks closedness of Principal Forward Types, which is the property engines depend on. We'd need to either introduce "bottom sharability" or require separate instructions / annotations.

tlively commented 4 weeks ago

Adding a bottom shareability for validation purposes would be an interesting solution. The typing rules for shareability-polymorphic instructions would have to propagate bottom shareability from their input to their output, so this solution is similar to the solution @conrad-watt suggested of propagating the bottom value type from the input to the output. But AFAICT bottom shareability propagation maintains the current decomposition property that says any valid sequence of instructions can be decomposed into two valid sequences of instructions at any point, whereas full-blown bottom type propagation would break that property.

(We didn't notice this problem in Binaryen because we do full-blown bottom type propagation anyway, contrary to the spec. It would be nice to fix this in our parsers at some point.)

@abrown, do you think this bottom shareability idea would be easily implementable in wasm-tools?

conrad-watt commented 4 weeks ago

Nullability is recorded directly on the reftype but shareability comes from the heaptype, so there would be some spec messiness unless we restrict ⊥-shared to appear only on abstract heap types (i.e. it's hard to manufacture a new concrete heap type that only appears during validation, since these are identified by index). I guess for this particular case it's sufficient to introduce any ⊥-shared and extern ⊥-shared. Are there any potential polymorphic instructions that might produce non-abstract heap types? How is ref.as_non_null currently validated in dead code, since it doesn't have an annotation?

any valid sequence of instructions can be decomposed into two valid sequences of instructions at any point, whereas full-blown bottom type propagation would break that property.

Well, with bottom propagation the second sequence of instructions would have type bot -> bot. I guess it could make outlining more complicated, since replacing bot -> bot with a well-typed function call might invalidate subsequent dead code. But I'm still kind of cynical about these examples - it seems easier to just remove all syntactically dead code as a first step!

conrad-watt commented 4 weeks ago

How is ref.as_non_null currently validated in dead code, since it doesn't have an annotation?

It looks like we introduce a bot abstract heap type as well as a bot value type, so this all works out (link).

alexcrichton commented 3 weeks ago

The situation is a little funky in wasm-tools because historically we've tried to avoid adding bot in places that can be exposed outside of the validator. Within validation we currently have

enum MaybeType {
    Bot,
    HeapBot,
    Type(ValType),
}

which enables us to encapsulate the bottom-ness within the validator itself and not part of the public API of reading a wasm module or getting its type information. In that sense adding bot to the absheaptype equivalent wouldn't be easy because things aren't set up that way, but it's doable.

I'll admit though I'm not following some of the discussion here 100%, so I wanted to ask a question about this. Is the idea that if there's a bot variant of absheaptype then this problem just goes away? For any.convert_extern which @abrown mentioned I'm not sure what the shared bit would be set to.

Or is the suggestion that there's a new sharedbot variant? In addition to a maybesharedbot variant?

tlively commented 3 weeks ago

The HeapBot in the code you quoted corresponds to the reference type (ref heap-bot) rather than the bottom heap type from the spec. That works so far because we only ever need to use the bottom heap type as part of a non-nullable reference. But it makes it tricky to extend the code to handle the proposed share-bot, which can appear with heap types other than heap-bot.

The type on the stack after (unreachable) (any.convert_extern) would be (ref (share-bot any)) because it must be some kind of reference to either a shared or unshared any. It is conservative to assume the reference is non-nullable, but future instructions might expected either a shared or unshared any, and neither works for instructions that expect the other, so we have to use the new share-bot instead.

If I were trying to implement this in wasm-tools, I would try using something like these enums:

enum MaybeShare {
    ShareBot,
    Share(Shareability),
}

enum MaybeHeapType {
    HeapBot,
    MaybeShare(MaybeShare, AbsHeapType),
    Type(HeapType)
}

enum MaybeType {
    Bot,
    MaybeReference(MaybeHeapType),
    Type(ValType),
}

And enforce an invariant that the MaybeHeapType and MaybeShare representations can only be used for types that cannot be represented as a ValType.

alexcrichton commented 3 weeks ago

In prototyping this I found that it was sufficient to effectively add the concept of a bottom heap type with an optionally known abstract heap type. The heap bottom type suffices to represent "maybe shared, maybe not" and the optionally known abstract heap type handles the any.convert_extern case (and vice versa) by saying that it's at least a known abstract type. For wasm-tools that ended up with

enum MaybeType {
    Bot,
    HeapBot(Option<AbstractHeapType>),
    Type(ValType),
}

in the validator. I've only updated the instructions that @abrown has implemented for shared-everything-threads in wasm-tools though so this may not be sufficient for everything if there's more flavorful instructions to be added.

rossberg commented 2 weeks ago

Sounds plausible.

The spec internally extends the type grammar for validation purposes, as specified here (which come with suitable subtyping rules). For sharedness, we'll need a corresponding extension with bot.

An actual implementation of validation will have to do something similar. But not all expressible cases can arise with the current instruction set. IIUC, the representation you show uses HeapBot to represent (ref (botshared absheaptype)), but it cannot express (ref null (botshared absheaptype)) or (ref (botshared $t)). Currently that's probably okay, because the latter do not arise in any instruction's forward principal type, I think. Though there could be future instructions that need to distinguish some of these cases.

alexcrichton commented 9 hours ago

To perhaps close the loop on this, an update to wasm-tools's implementation was in https://github.com/bytecodealliance/wasm-tools/pull/1734 with a number of new tests that pass as a result of that PR as well.