Canonical abi: `union-type`

oovm commented 6 months ago

Motivation

Runtime Reinterpretation

Many languages have the ability to reinterpret a piece of data at runtime, and this ability can be constrained by the type system.

For example, in C language:

#include <stdio.h>

union MyUnion {
    int i;
    float f;
};

int main() {
    union MyUnion u;

    u.i = 42;
    printf("Value of i: %d\n", u.i);

    u.f = 3.14;
    printf("Value of f: %f\n", u.f);

    return 0;
}

Or in typescript:

interface MyUnion { i: number } | { f: number }

let u: MyUnion;
u = { i: 42 };
console.log("Value of i:", u.i);

u = { f: 3.14 };
console.log("Value of f:", u.f);

Interface merging (interface subtype)

In some type systems, identical types or subtypes can be merged:

type Option<T> = T | null;

Option<Option<T>>  ==>  Option<T>

This is completely different from variants (sum types) in algebraic data types.

ABI change

Add a new union option, which does not take effect by default

This option will require variants to pass data directly without adding additional enumeration parameters.

Changes to `reference-type`

When reference-type and union-type are enabled at the same time, the following changes will occur

wit type	wasm w/o `rt` + `ut`	wasm w/ `rt` + `ut`
`option<bool>`	`(i32, i32)`	`(ref null i31)`
`option<option<bool>>`	`(i32, i32)`	`(ref null i31)`
`option<char>`	`(i32, i32)`	`(ref null i32)`
`option<i8>`	`(i32, i32)`	`(ref null i31)`
`option<i32>`	`(i32, i32)`	`(ref null i32)`
`option<i64>`	`(i32, i64)`	`(ref null i64)`
`option<T>` (heap type)	`(i32, SIZE_OF_T)`	`(ref null $t)`
`option<option<T>>` (heap type)	`(i32, SIZE_OF_T)`	`(ref null $t)`
`result<A, B>`	`(i32, MAX_SIZE_A_B)`	`anyref`
`variants`	`(i32, MAX_SIZE)`	`anyref`

Each variant item will have an independent type id, which is used for type conversion and distinguishing variant items with the same name.

variant a { // struct a
   aa(i32)  // struct a-aa (field i32)
   ab(i32)  // struct a-ab (field i32)
}
variant b { // struct b
   aa(i32)  // struct b-aa (field i32)
   ab(i32)  // struct b-ab (field i32)
}

This helps to implement features such as abstract classes, interface inheritance, ?. (non-null call), ?? (null merge), etc.

lukewagner commented 6 months ago

I haven't digested the whole idea, but two initial thoughts:

I don't think in general we can conflate option<option<T>> with option<T> since none may mean something different than some(none). (Whether it's good interface design to have an option<option<T>> and depend on this difference is a different story, but it's hard for me to feel comfortable declaring that there is never a good reason to do this.)
Until wasm-gc has explicit rtts that are generative (i.e., not canonicalized), then if we have two cases in a variant with the same structural contents, then a receiver of the variant value will not be able to use casts to tell the two cases apart, which would lose potentially-necessary semantic information. Thus, I think we'll need an explicit discriminant until then (which will take a while).

rossberg commented 6 months ago

For context, options of options usually come up when composing things. For example, you have some domain of values that is represented by an option (say, because it contains an "empty" element), and then you need to put those in some kind of map where the lookup function returns an optional result to indicate lookup failure. Conflating the two then becomes a fatal composability failure.

fitzgen commented 6 months ago

Until wasm-gc has explicit rtts that are generative (i.e., not canonicalized), then if we have two cases in a variant with the same structural contents, then a receiver of the variant value will not be able to use casts to tell the two cases apart, which would lose potentially-necessary semantic information. Thus, I think we'll need an explicit discriminant until then (which will take a while).

FWIW, you could do types inside a rec group to prevent a.aa and a.ab from canonicalizing to the same thing, but that seems less elegant than simply using a discriminant to me.

lukewagner commented 6 months ago

Oh right, good point! Thinking about which is better from a perf POV, I would guess that for low number of cases, the difference is negligible, but for high numbers of cases, the nice thing about a (dense) discriminant is that you can br_table on it, so probably that one is the winner.

WebAssembly / component-model