WebAssembly / component-model

Repository for design and specification of the Component Model
Other
899 stars 75 forks source link

Working with GC types without copy #305

Open oovm opened 4 months ago

oovm commented 4 months ago

I'm having some trouble switching to wasi preview 2.

For example, the following interface:

package wasi:random@0.2.0;
interface random {
    get-random-bytes: func(len: u64) -> list<u8>;
}

The function signature is func (u64) -> (list<u8>)

But its lower type is core func (i64, i32) -> (), which is very difficult to use.

If I want to convert it to core type (array (mut u8)), a very long glue code is required.


I hope to add a GC mode canon option that can make the lower type similar to core func (i64) -> (array u8).

For complex nested types, getting the specified data requires very complex pointer algebra, whereas if using array it only requires multiple array.get.

I think this helps simplify the use of some external interfaces, such as:

package wasi:filesystem@0.2.0;
interface preopens {
    get-directories: func() -> list<tuple<descriptor, string>>;
}
lukewagner commented 4 months ago

Yes, agreed. It's definitely the plan of record to add a gc canonical ABI option, just like you're describing. (It's one of the original motivations for having an IDL that abstracts low-level memory representation, even.) We've mostly been waiting for (1) wasm-gc to be finalized, which it now is and (2) an implementation of wasm-gc to show up in a runtime that also implements components (e.g., one is in progress in Wasmtime). But, if you or anyone else wants to run ahead and create a PR adding the gc option to the proposal (Explainer.md, Binary.md and, mostly significantly, CanonicalABI.md), that would be welcome too.

oovm commented 4 months ago

Before ref-types, gc-types, stringref and other features are stable, we have enough time to discuss how the gc language should obtain wasi data.

In fact, after considering gc types, there is a better correspondence between the wasi type and the wasm type.

No options indicate pointer mode, add reference-type(tentative) to indicate conversion to immutable reference, add mutable-reference(tentative) to indicate internal mutable reference.

Upper Type Lower Type Canonical Options Requisite
u32 i32
tuple<u32, u32> (i32, i32)
tuple<u32, u32> (struct (field i32) (field i32)) reference-type gc
tuple<u32, u32> (struct (field mut i32) (field mut i32)) mutable-reference gc
record {a: u32, b: u32} (flatten layout) (i32, i32)
record {a: u32, b: u32} (struct (field $a i32) (field $b i32)) reference-type gc
list<u8> (i32, i32)
list<u8> (array u8) reference-type gc
list<u8> (array mut u8) mutable-reference gc
string (i32, i32)
string stringref reference-type gc, stringref
string (string.encode_utf8 stringref) reference-type + string-encoding=utf8 gc, stringref
borrow<string> string_view reference-type gc, stringref
resource i32
resource externref reference-type ref-types
flags (flatten layout) (i32 × ⌈flags / 32⌉)
enum i32
option<u32> (ref null i32) / i31ref reference-type gc
option<t> (ref null T) reference-type gc
result<t, e> ? ? ?
variant ? ? ?

variant may be similar to subtype with downcast in gc context.

oovm commented 4 months ago

Another benefit is that if all gc types are used, there is no need to bring in a memory allocator, which helps reduce the size and warm up faster.

rustc's cabi_export_realloc takes about 27000 lines of wasm instructions(release mode), libc is even larger.

Other smaller allocators sacrifice either speed or security.

(component
    ;; Define a memory allocator
    (core module $MockMemory ;; Replace here by an actual allocator module, such as libc
        (func $realloc (export "realloc") (param i32 i32 i32 i32) (result i32)
            (i32.const 0)
        )
        (memory $memory (export "memory") 255)
    )
    (core instance $mock_memory (instantiate $MockMemory))
    ;; import wasi function
    (import "wasi:random/random@0.2.0" (instance $wasi:random/random@0.2.0
        (export "get-random-bytes" (func (param "length" u64) (result (list u8))))
    ))
    ;; wasi function to wasm function
    (core func $wasi:random/random@0.2.0/get-random-bytes (canon lower
        (func $wasi:random/random@0.2.0 "get-random-bytes")
        (memory $mock_memory "memory")
        (realloc (func $mock_memory "realloc"))
    ))
    ;; import wasm function
    (core module $TestRandom
        (type (func (param i64 i32)))
        (import "wasi:random/random@0.2.0" "get-random-bytes" (func $wasi:random/random@0.2.0/get-random-bytes (type 0)))
    )
    ;; instantiate wasm module with wasi instance
    (core instance $test_random (instantiate $TestRandom
        (with "wasi:random/random@0.2.0" (instance (export "get-random-bytes" (func $wasi:random/random@0.2.0/get-random-bytes))))
    ))
)

If using the gc type, this can be simplified to:

(component
    ;; import wasi function
    (import "wasi:random/random@0.2.0" (instance $wasi:random/random@0.2.0
        (export "get-random-bytes" (func (param "length" u64) (result (list u8))))
    ))
    ;; wasi function to wasm function
    (core func $wasi:random/random@0.2.0/get-random-bytes (canon lower
        (func $wasi:random/random@0.2.0 "get-random-bytes")
        reference-type
    ))
    ;; import wasm function
    (core module $TestRandom
        (type (func (param i64) (result (array u8))))
        (import "wasi:random/random@0.2.0" "get-random-bytes" (func $wasi:random/random@0.2.0/get-random-bytes (type 0)))
    )
    ;; instantiate wasm module with wasi instance
    (core instance $test_random (instantiate $TestRandom
        (with "wasi:random/random@0.2.0" (instance (export "get-random-bytes" (func $wasi:random/random@0.2.0/get-random-bytes))))
    ))
)

Obtaining a field of gc type requires only one instruction and does not require pointer algebra (at least three instructions), further reducing the binary size.

lukewagner commented 4 months ago

Yes, really good point regarding mutability vs. immutability; we probably do want both as ABI options. A really nice benefit of immutability is that if both sides of a component-to-component call use immutable GC references, no copy needs to be made when passing a reference across the boundary. OTOH, if your language ultimately does need a mutable array of bytes, then the immutable GC option may impose an extra unnecessary copy; thus having both options make sense.

String its its own story, but definitely a Unicode-encoded (array u8) makes sense (if we treat string-encoding as orthogonal, then all three of utf8, utf16 and latin1+utf16 could be encoded into this array of u8/u16). Based on the last CG meeting, stringref is either not going to happen or not any time soon. However, we could add something stringref-y at the Component Model level in which we lower string values to a reference type (externref initially, later we could eliminate dynamic type checks with type imports) and supply canonical built-ins for operating on these strings (being quite careful to support only basic operations that have the same O(1)/O(n) cost on all host string representations such as sequential code-point iteration or bulk-copy-into-linear-memory and are trivial to implement w/o giant Unicode tables). But (array u8) is probably the right place to start.

oovm commented 4 months ago

Considering the complexity of mutable and some incoming features such as partially mutable, readonly and freeze, it may need to exist as a reference-type parameter.

Taking into account proposals such as thread and share-everything-threading, you can consider implementing this feature in stages.

The initial version only provided immutable types that did not require copying.

Mutability is a post-MVP content, before which users need to sacrifice certain performance to manually implement some glue code to copy to the required types.