WebAssembly / gc

Branch of the spec repo scoped to discussion of GC integration in WebAssembly
https://webassembly.github.io/gc/
Other
990 stars 71 forks source link

Thought experiment: JS subset using GC MVP #113

Open bvibber opened 4 years ago

bvibber commented 4 years ago

Thought experiment: JS subset using GC MVP

"One complete but limping implementation of an important language is worth 20 long threads on subtyping" -- @titzer, https://github.com/WebAssembly/gc/issues/111#issuecomment-670999383

Well you nerd-sniped me. :) Here's a rough sketch of what a JS compiler targeting WebAssembly with the GC MVP proposal might look like. I don't currently plan to implement it as-is but if people would be interested in working together on one, let me know! This plan is subject to further editing of details based on discussion, and is meant to prompt discussion about possible changes to the GC proposal based on what an implementation would need. -- @brion

Goals:

TL;DR version:

Recommendations:

Value representation

A universal representation ("variant" or "any" type) is required when handling JS values of unknown or variable type.

Given the absence of a variant type in Wasm that can represent double-precision floating point values, the current solution is to use anyref as the variant representation, with integers in i31 range stored as i31ref and larger numbers boxed into structs.

An important part of JS compiled code will be unpacking runtime type information to determine the runtime type of a value from an anyref if it couldn't be statically verified. An important consideration is that GC MVP proposal's structural subtyping will consider two struct types equal if they are laid out the same. So if the struct or array representation is the same between two different types, they must be distinguished by adding a field with a discriminator.

Types that encounter this problem include boxed types like boolean and i32 which have identical machine representations for their payloads.

Undefined

The undefined special value is stored as a null reference, giving "free" initialization of newly allocated arrays to contain our undefined value.

Compiled code can test for undefined with ref.is_null.

JSValue base struct

All other values (except i31ref-optimized numbers) will be represented via structs, all of which start with an i32 field with a type discriminator enum and another for a hash value.

JSValue:

The type_id allows quick discrimination over a large number of possible types and dispatch to various code paths with a switch (eg, via br_table) or more targeted checks via a compare and branch. The only rtt checks needed are for i31ref and the JSValue struct base, then once a type id is known the correct full structure can be cast to in the specific code path.

This enum's values may be tailored to making it easy to check for specific subclasses of behavior based on bitfields or range comparisons.

The stable hash value can be used to put any JS object into hash maps, such as implementing the JS Map class or in internal structures. Implementation of hash map internals is left as an exercise for later.

However, generating a hash code for every boxed number, or large string, that's never used in hash key lookups would be wasteful. The hash value should be lazy-initialized, either using 0 as an uninitialized sigil (requires modifying any hash value of 0 to another value) or by storing a second field with an initialization check (which would be slower).

For objects where the hash is based on identity, the hash computation can start with some sort of created instance counter and calculate a hash from that, or the equivalent -- the important thing is once assigned it must never change. And for types that implement hash equality based on content (BigInt, string, boxed numbers) this will instead use the same value for every instance with the same content.

Booleans, null, deleted

A boxed boolean struct contains its numeric value (0 or 1) as a field for easy use via dereference. Normally globally unique values can be used for boolean values as well, avoiding the cost of allocating new ones on the heap at runtime.

BoxBoolean(JSValue):

Similarly the null value carries a nice 0 numeric value in some operations, so it's using what's really the same struct type:

BoxNull(JSValue):

And again it will usually be a globally unique instance.

A sigil value is also needed for deleted entries in arrays, as distinct from present undefined entries:

BoxDeleted(JSValue);

This is changed out for an undefined value when read, but needs to be skipped over in iteration etc. To make array lookups efficient this comparison should be done against a unique reference value if possible.

These will all have hash keys that should be distinct from their integer values, but need to be stably based on their types & their contents if new instances ever get created, as all instances of these types must act identically including as Map keys.

Numbers

Integers known to be in the i31 range can be packed in an i31ref, however this must be either statically verified during compilation or dynamically checked at runtime. Larger values can be boxed:

Boxi32(JSValue):

and floating-point values, or overflow after 32-bit arithmetic, can be boxed as well:

Boxf64(JSValue):

Heap boxing is not free, and can have a significant cost if there's a lot of churnover of bit fiddling or floating-point math that exercises the boxing a lot.

Both numeric box types must use a content-based hash value, as values of different provenance must produce stable hashmap keys.

Calculating this hash value on every boxing operation would be a significant expense, especially when most numbers will never be used as hash keys, so it should be deferred and calculated on demand.

BigInts

BigInts could be stored something like this:

BigInt(JSValue):

Values that fit in 64 bits would not require a separate allocation, while larger values would store additional bits in 64-bit chunks in an array.

There may be better storage arrangements than this for arbitrary-width integers, but this should start.

The JSValue hash key must be content-based, and as with strings and boxed numbers should be lazy-initialized.

Strings

String contents can be represented as an array<i16> containing UTF-16 code units. We get dynamic allocation, GC, and a queryable length for free! But it is a second allocation on top of the struct with the type code and hash value.

It's also important to note that there's no restriction against modifying a string stored this way, even though JavaScript strings are immutable. Bugs in the compiler, or hostile code on the other side of a module boundary, could modify a string and invalidate assumptions held about past string equality.

String(JSValue):

The JSValue hash key field must be stable between multiple instances of identical strings, so it must be calculated from the string contents. This will be lazy-initialized when needed in a hashmap operation.

Symbols

A symbol has a name which is a string, but the name isn't unique while the symbol instance is. Fancy! Simplest thing is to have a JSValue struct that stores a reference to another string with the name:

Symbol(JSValue)

Symbols must use an identity-based hash key, as they are unique unlike strings.

Objects

The Object struct itself holds some JS internal fields like the prototype, a reference to a Shape describing the property names and configuration, and an array of anyrefs that carry the actual property values:

Object(JSValue):

(Don't forget the JSValue struct's type_id discriminator can tell us if we're one of the weird Object-based subtypes like Array.)

The fields array can be larger than the number of fields actually specified by in the Shape, to reduce the need to copy existing data when adding new fields.

What we'd like to do is optimize most property gets and sets into array loads and stores, using a cached lookup for the field index. If a function is regularly given the same type of objects, we need only confirm that the Shape matches the last-used value and we can use a previously-looked up index safely. If not, look it up through the Shape records and save it alongside the new shape reference for next time.

This means when we add a field in a given way at runtime, we must guarantee we will get the same new shape record every time. This means Shapes have both backward- and forward-facing references:

Shape:

field_count lists the total number of fields, which isn't necessarily one plus the offset of the last descriptor.

The descriptors contain Field struct refs, listed in more detail later, with the name/getter/setter/etc. These only list the fields changed in this version of the Shape. Later descriptors override earlier descriptors.

The parent chain can be followed to fill out all the fields back to the empty object or whatever the root of this shape chain is.

The successors list on the other hand lists all successor shapes to a given shape that have been made, eg by adding a field or changing a property descriptor. We can iterate through these looking for a shape that makes the change we're looking at to look up an existing shape. Again, for lookups we can cache the last-used parent & successor shape pair to avoid iterating through the entire list on subsequent code runs: having made the check once is enough to let us cache the new shape.

(That can be a fancier data structure than a simple array, if that helps to optimize lookups.)

Note that because weak refs don't exist in the GC MVP proposal, the current plan is to accept there can be memory leaks in the shape successors graph and the inline caches. If an object is allocated in a certain way, then all instances of it are freed, the objects will be freed but the shape itself will not be freed unless the root of the shape graph is freed and all cache sites are fed with different objects. For objects built up from {} this will never happen.

The field info includes the name, which can be either a String or Symbol, getter/setter refs, the actual field array offset, and some boolean fields for property configuration. Changing properties is done by replacing the entire Field structure, which should be considered immutable.

Field:

Arrays

JavaScript Arrays are a special case of Object that can implement fast "dense" arrays of properties using 32-bit integer indexes or their string equivalents, but can also be extended, used, or abused, in various interesting ways that may be surprising at runtime.

Plan here is to implement Array as a subtype of Object which includes a std::vector-like backing array ref, length & capacity, and runs anything else through the Object properties.

Array:

The usual vector implementation applies -- when adding items overflows the capacity, allocate a new backing buffer and copy data in. Capacity doesn't have to be stored here, as it's queryable from the array.

Deleted items are stored in-place as a singleton reference value.

TypedArrays

JS TypedArrays are tricky in that views of different types can be overlaid on the same backing ArrayBuffer. However GC MVP's array types provide for no such aliasing; an array<i8> can only be read/written a byte at a time, and an array<i32> can only be read/written at 32-bit granularity, etc.

Thus while ArrayBuffer is straightforward to implement, most likely you'd end up with a fast Int8Array and Uint8Array and very slow Int16Array and larger or Float32Array and larger, because they'd have to access the buffer a byte at a time and then combine the results.

ArrayBuffer(Object):

The nullable pointer is required because array buffers can be 'neutered' via transfer to another context, as seen on the web with postMessage. This could be used to implement data transfer to an outside host with minimal copying.

TypedArray(Object):

Functions and closures

JS Function objects are Objects but also must wrap some closure state (captured variables) and an internal function reference. This should be pretty straightforward:

Function:

The main question is what this implies for the signature of the target function. Unless we have multiple variants of Function that store different numbers of args, we're probably going to have to do something simple like creating a common ABI signature for all JS functions:

(func
    (param $closure anyref)
    (param $thisarg anyref)
    (param $argc i32)
    (param $arg1 anyref)
    (param $arg2 anyref)
    (param $arg3 anyref)
    (param $arg4 anyref)
    (param $rest ref null array<anyref>)
    (result anyref)
)

The $argc parameter passes the number of actual args passed, and any arg not used by the caller is initialized to undefined per the ABI so it can be used directly by compiled code. Any additional arguments are passed via an optional array. The exact number of optimized arguments may need to be tuned.

The additional args can then be copied or read directly for use of arguments and friends, which may require special compiler support when used.

A closure can use a function-specific struct type, so if values are guaranteed by static analysis to be a certain type (eg, i32 or f64) they can be stored that way, avoiding a heap boxing for floating point. However anything that can't be statically determined must be an anyref.

If doing one-pass AOT compilation, local variables can live in Wasm locals, with any declared arguments beyond the ones provided copied into place from the rest array. The full JS arguments object behavior can either be forbidden for simplicity, or special-cased in slow-path code for functions using it.

Note that if doing an optimizing compiler that has points where it can escape from one tier of compilation to another, suitable interfaces would be necessary to transfer arguments and variable state to the other version of the function. This is complex, requiring functions compiled specifically to exit or enter at certain points, and is outside the scope of this current proposal.

Identity-related features (Maps, WeakMaps)

JS Map allows any JS value as a key, including both object references and primitives such as strings, numbers, null...

I see two implementation strategies:

Both seem plausible; note you can't use the object reference itself as a low-level hash because it can't be converted to a number.

As far as I can tell there is no way to implement JS WeakMap using the GC primitives, leading us back to:

Without using host imports, I don't think a WeakMap can be implemented currently.

Proxies

JS Proxy objects are a special type of object that forwards a bunch of operations to another object, or (at the Proxy creator's option) performs alternate code. The slow-path operations are relatively straightforward, they're just an extra layer of indirection, and should not create any huge surprises I think.

Security and module isolation

Since JS values are stored in anyref they can be transmitted to and from other modules over import/export boundaries or function references that have been sent in.

As far as I can tell there is nothing to prevent the other side of the module boundary from modifying the internal state of the struct and array objects, such as strings and JS objects and their internal data structures, such as object shape definitions.

Thus any form of security encapsulation inside the JS virtual machine implemented in this module must be considered NULL AND VOID in the face of transfer of references to an untrusted module.

Examples

Consider the following program:

class Complex {
    constructor(x, y) {
        this.x = x;
        this.y = y;
    }

    abs() {
        return Math.sqrt(this.x * this.x + this.y * this.y);
    }

    add(other) {
        this.x += other.x;
        this.y += other.y;
    }

    mul(other) {
        let newX = this.x * other.x - this.y * other.y;
        let newY = this.x * other.y + this.y * other.x;
        this.x = newX;
        this.y = newY;
    }
}

// z(n+1) = z(n)^2 + c
function iterate_mandelbrot(c, maxIters) {
    let z = new Complex(0, 0);
    for (let i = 0; i < maxIters; i++) {
        if (z.abs() >= 2) {
            return i;
        }
        z.mul(z);
        z.add(c);
    }
    return maxIters;
}

This code, when called over a suitable iteration area of the Mandelbrot set performs reasonably well in V8, comparably to code that uses pairs of variables and inlines all the computation manually.

Do not expect V8-level performance for this program in our hypothetical VM. :(

Several notes here. First, remember that numbers outside the i31 range must be boxed into heap objects. This will be true of the x and y properties of the Complex c and z objects, so accessing them requires (at best case) another indirection beyond just looking into the field list. And updating the value similarly requires allocating a new boxed struct, as the old one could have been copied elsewhere and be in use. (We can't easily create new types at runtime, so it's not possible to store doubles directly without duplicate storage arrays.)

Second, no static annotations indicate the types of any of the arguments, so c and maxIters must be checked... additionally, because JavaScript allows calling functions with arbitrary this arguments there's no guarantee that this will be a specific type in method bodies! Thus every method must have a guard check for the this type in front of any attempts to use the Complex-specific object shape, with a suitable slow-path backup looking up properties or checking for expected types.

It should though be possible to inline the Complex class's abs, mul, and add methods by using static analysis with a check that the Complex type had not changed unexpectedly at runtime. This might involve a runtime check when unwrapping the Function object of a method property. Fast path on confirming the method pointed to the expected function would run the inlined code (which now can in fact assume that z is a Complex and that it has a particular shape) with a slow-path that calls the actual function with the actual values.

The field indexes on c (which cannot be statically determined to be a Complex) could be cached inline, but the check for objectness must be performed first and the shape confirmed against the cache before use.

Consider the following code:

function create_packet(ts, track, data, is_keyframe) {
    let packet = {
        ts,
        track,
        data
    };
    if (is_keyframe) {
        packet.keyframe = true;
    }
    return packet;
}

This produces different object shapes depending on whether an argument is set, eek! A loop handling many packets could end up having to re-perform field index lookups when changing shapes during the stream processing. If shapes change a lot, this could lead to some thrashing.

We could be slightly cleverer about the shape comparisons, where if there's not an exact match we look up the parent chain for the one we were looking for, but this might have problems with overwritten definitions with the current Shape schema.

This may bear further thought and comparison with major JS engine implementations.

Performance considerations

Exception handling

I have not yet covered exceptions or exception handling, hoping they will be provided by another Wasm extension. Currently there's no way to "catch" traps, or create exceptions carrying a data payload in Wasm.

Thus propagation and handling of exceptions would require either waiting for that to be available, or importing support functions from the host (JS-only like in emscripten?), or using a global exception-state variable that's checked after every potential exception site, with manual unwinding. This will have some overhead.

Property loads

Thinking more about the property lookup process in eg let val = obj.prop. We can bias our checks to assume it's probably an object of matching shape.

This is all pretty verbose, so either it has to be compiled inline (potentially with slight tweaks depending on which types are most expected at compile time) or it has to be done with a call into a function (which may potentially be inlined in the native codegen).

Doing caches with globals defined at compile time would probably perform the best, but means you either need to inline the cache-manipulating code, or you have to store them in a GC obj or linear memory somewhere to pass them into a non-inline function. (Eg, if you call the slow-path code, you want to cache the shape, and the offset, and the getter/setter, so you either need multivalue returns or a struct you can pass to the slow path containing the cache fields.)

bvibber commented 4 years ago

Updated with a quick look at what a JS property-lookup looks like in this model. It's not pretty, but not super horrible I think. There are probably some tunings that can be made to minimize rtt comparisons and field lookups and simplify cached code.

bvibber commented 4 years ago

The thing I'm most unsure about is the efficiency of type checks and branching out to various type-specific behavior...

From what I can tell, an attempt to cast with ref.cast will always require an rtt comparison, even if we already know which type it is by reading a discriminator field from the base JSValue struct. So ref.cast is just as expensive as br_on_cast, and can mean we're essentially doing two subtype comparisons under the hood.

If there's an "expected" type for a value at compile time, we may be able to optimize the checks by having a fast-path br_on_cast straight to the expected subtype, then ref.cast to JSValue and do the br_table if it failed.

For instance, let val = obj.propwould optimize for obj being an Object of any subtype and "prop" being a non-numeric string (no Array numeric index paths invoked, and even no need to cast to String because we can use a global holding the run-time copy of that compile-time string)...

On the other hand let val = obj[index] would assume index is variable: either an int31ref (probably), a String (common), a Symbol (less common), or something much rarer. If it's a numeric type or a string containing a positive 32-bit integer representation, then we have to check type_id for obj being an array type and cast to the suitable Array or TypedArray subtype.

(It's probably best to defer the numeric checks on strings until after checking for arrayness. Potentially it might be useful to memoize the numeric checks in the String struct too, though this might not be a big deal.)

Also: does br_table work efficiently in terms of branch prediction in a hot loop? (I'm guessing it should work reasonably well when fed consistent input but could be wrong.)

Note again the rtt checks only help when the types are structurally distinguishable in the MVP proposal: for instance you could not distinguish between Boxi32 and BoxBoolean from the rtt, and would have to check the type_id field if the difference matters in context. But in some contexts it doesn't matter, and you'd read the 0 from a BoxBoolean directly to use in math for instance just as you would a Boxi32...

dcodeIO commented 4 years ago

Since this issue is exploring GC from a JS perspective, with AssemblyScript being similar, perhaps a view on a hypothetical ideal GC from an AssemblyScript perspective plus a couple thoughts of the implications might be interesting. The following is oversimplified to keep it short, and likely unrealistic, but maybe there's something good to extract from it in context :)

Obviously, the closer the GC spec is to JS's object model, the more suitable it is for a JavaScript-like language to compile to it - respectively for any language to get ideal interop with JS. A nice-to-have-in-an-MVP list of high-level features may include:

Most (hopefully more) of this should be possible to validate at link time, browsers (hopefully) already have most of the components necessary and it's not that different from the current state of the proposed MVP. Yet, non-web engines would need some sort of a Wasm standard library providing the respective abstractions, perhaps created and provided as a joint effort by the WebAssembly community.

One interesting aspect is how much such a scheme would already benefit languages that can be transpiled to JS today, but not Wasm, and as we all know there are a lot of these. Given that some of them need dynamic typing, having an any type and the ability a declare a TypedObject of any members or even a Map<any,any> as an escape hatch (even though inefficient due to redundant type checks and potential boxing) to enable them to target Wasm in the first place and later on think about a Wasm JIT spec on top of Wasm GC to get rid of redundant type checks and boxing at runtime might be a strategy not yet explored. Is based on the (as I think very interesting) observation made in another thread that for some languages, it might just be better to keep compiling to JS, yet on top of that we'd keep the door open to go full circle eventually.

Also worth mentioning is that there's no good C++ story for existing code in there, but it probably won't need much of the above anyway, except good interop. However, an interface to the Wasm standard library mentioned above can then be used from C++ or similar, i.e. to convert GC strings to linear memory or vice-versa, or to use Wasm GC-specific functionality directly where it makes sense, or to magically replace C++ standard library usage with Wasm standard library usage (next-level Emscripten), or even to compile Wasm standard library code to native by shipping a dynamic library able to provide this or other smart ways of downleveling. Regarding the web, the simplest case would be a GC-aware wrapper around a C++ library, while in other cases a developer might consider switching out parts of the C++ standard library with the Wasm standard library where it makes sense, or start a new project right on top of it.

That's all :)

bvibber commented 4 years ago

A note on the notion of private struct/object fields -- currently the only way to add new functions at runtime (useful for both optimization and dynamic code loading) is to import host functions that compile a new module and and link them to share internal state.

If functions compiled in the trusted second module can't access private fields from types defined in the first module, you're gonna have a bad time.

So either there would need to be a way to add functions at runtime that kept module identity, or the privacy restriction would need to be enforced through an import... having access to the type definition would work if subtyping is not structural but based on declarations and inheritance patterns?

dcodeIO commented 4 years ago

Hmm, yeah, adding new functions at runtime is something this doesn't cover. Perhaps dynamic code generation or loading might become an independent Wasm Reflection spec providing what's necessary, i.e. looks like this needs some sort of GC or JIT first anyway?