WebAssembly / gc

Branch of the spec repo scoped to discussion of GC integration in WebAssembly
https://webassembly.github.io/gc/
Other
996 stars 71 forks source link

Performance of type equivalence and subtype checking #68

Closed dead-claudia closed 2 years ago

dead-claudia commented 5 years ago

I just wanted to chime in with some comments about the performance of those two bits in the current proposal. I do see performance overhead in it, but I don't see it as dire as the relevant sections make it sound. Specifically, this details a potential way to do it in:

No action is necessarily required of this - it's purely informative.

Internal structure

The proposal overview details the type grammar as this:

num_type       ::=  i32 | i64 | f32 | f64
ref_type       ::=  (ref <def_type>) | i31ref | anyref | anyfunc
value_type     ::=  <num_type> | <ref_type>

packed_type    ::=  i8 | i16
storage_type   ::=  <value_type> | <packed_type>
field_type     ::=  <storage_type> | (mut <storage_type>)

data_type      ::=  (struct <field_type>*) | (array <field_type>)
func_type      ::=  (func <value_type>* <value_type>*)
def_type       ::=  <data_type> | <func_type>

For type equality and subtype checking, all I care about is value_type, def_type, and, nothing else. The first is the type used for parameters and the second is used for type names. So I can transform the grammar to this to make it more regular and easier to process:

value_type   ::= i32 | i64 | f32 | f64 | i31ref | anyref | anyfunc | <ref_type>

storage_type ::= i8 | i16 | <value_type> | <packed_type>
field_type   ::= (imm <storage_type>)
field_type   ::= (mut <storage_type>)

ref_type     ::= (ref (struct <field_type>*))
ref_type     ::= (ref (array <field_type>))
ref_type     ::= (ref (func <value_type>* <value_type>*))

(I can freely just convert the def_type types into (ref <def_type>) since (type ...) bodies can only ever contain a single <def_type>. Since that's the only difference, I can freely ignore it and just tie (type ...) to ref_type instances instead.)

Non-reference types would be represented via sentinels referring to each type directly. These would be tagged to denote they're primitives and not pointers, to save memory. Reference types would be represented via pointers to nodes in a persistent tree, where "array" nodes carry a child type and "shape" nodes carry a child count followed by that many child types. "Shape" nodes can also carry a bit denoting whether they're callable, but this isn't required as function references' nullptr sentinel is enough to tell it apart from struct references. In each of these, there exist mutable and immutable variants, rather than type and (mut type). This simplifies comparisons and saves a bit of memory in the process.

This of course doesn't strictly mirror the spec, but acts as a form of desugaring. Here's how the spec types would desugar:

And of course, type aliases are resolved to the relevant sentinel, not by name. This simplifies comparisons and is critical to the speed benefit.

This can be built up in memory and persisted using:

When parsing nodes, the path is used to identify the node to access, and if a node doesn't exist, it's created before returning or recursing. These nodes do need collected as normal heap objects, but they don't require a sophisticated algorithm. Absent recursive references, you could even use reference counting. These are meant to be globally shared and interned much like strings and well known symbols, but they just have to be global to a given execution context, not necessarily globally to the entire world.

After parsing a type, the type index map is searched with the current type index for the list of dependent nodes matching the type index key and that entry is then removed from the type index map. After that, each pointer in the list is directly assigned the new child.

This keeps the memory overhead worst-case O(b) space, where b is the number of bytes used to define the backing (type ...). Retrieving and potentially allocating is itself done in O(b) worst-case (with b defined above), assuming constant-time allocation, memory loads, and memory stores. The constant factor itself isn't particularly high, and the memoization would likely mitigate most of it.

Performing the checks

Type equivalence is simple pointer equality. Part of the reason to memoize the structs and function types is to allow this. Subtype checking is a bit more complicated: it requires some memory traversal and iteration. Assuming constant-time memory loads, it can be kept down to O(nd) time worst case, where n is the number of child nodes and d is the max depth of the smallest tree, or equivalently, O(b) time worst-case, where b is the number of bytes used to define the backing (type ...). This does adapt - it skips entire structs that are equivalent and it skips entire structs that carry too many fields to be a supertype or equal. It also adapts recursively, so if it can't adapt to one child, it can still adapt to the rest. So it should not normally hit this worst case scenario.

Subtype checking would generally follow the following algorithm:

#include <cstdint>
#include <cstddef>
#include <cstring>
#include <vector>
#define if_likely(T) if (__builtin_expect(!!(T), 1))
#define if_unlikely(T) if (__builtin_expect(!!(T), 0))
struct Type;
// Implementation omitted for brevity
extern bool isInequalPrimitiveAssignableTo(const Type* a, const Type* b);

enum class TypeInfo: uintptr_t {
    PrimitiveTag = 1 << 0,
    MutableTag = 1 << 1,
    Shape = 1 << 2,
    Callable = 1 << 3,
};

struct TypeAlloc {
    const TypeInfo info;
    union {
        struct {
            size_t total;
            const Type* children[];
        };
        const Type* child;
    };
};

static inline bool isPrimitive(const Type *type) {
    return (uintptr_t) type & (uintptr_t) TypeInfo::PrimitiveTag;
}

bool isAssignableTo(const Type *self, const Type *other) {
    while (true) {
        if_likely (self == other) return true;
        if_unlikely (isPrimitive(self) != isPrimitive(other)) return false;
        if_unlikely (isPrimitive(self)) {
            return isInequalPrimitiveAssignableTo(self, other);
        }
        const TypeAlloc* a = reinterpret_cast<const TypeAlloc*>(self);
        const TypeAlloc* b = reinterpret_cast<const TypeAlloc*>(other);
        const TypeInfo info = a->info;
        if_unlikely (info != b->info) return false;

        if_likely ((uintptr_t) info & (uintptr_t) TypeInfo::Shape) {
            size_t total = a->total;
            if (total < b->total) return false;
            for (size_t i = 0; i < total; i++) {
                if_unlikely (!isAssignableTo(a->children[i], b->children[i])) {
                    return false;
                }
            }
            return true;
        }

        self = a->child;
        other = b->child;
    }
}

Of course, this is itself somewhat optimized, but there's still room. For one example, that struct check could check the fast path first, that of all identity, and break early if that's true. (This could be a simple memcmp after checking the total.)

Other information

I don't have benchmarks. I just wanted to show where the baseline was academically, to show that it is efficiently implementable.

This is a pretty long issue comment, so it's possible I missed something. If I did, feel free to let me know and I'll correct it in an edit.

rossberg commented 5 years ago

Fortunately, there are well-established results from computer science we can turn to.

Now, practical programs do not exercise these worst-case complexities, since recursion among types usually is used in very stylised and limited ways that remain linear. However, the concern is that an attacker might try to use the worst case as a DOS attack on an engine. One obvious mitigation might be that we impose a linear implementation limit on these checks in sensitive environments like the web.

tlively commented 2 years ago

The performance of equirecursive type canonicalization has turned out to be too high in practice, even using state-of-the-art DFA minimization algorithms, so we've settled on the isorecursive system defined in #243, which allows for a linear-time canonicalization algorithm.