Open oovm opened 4 months ago
Yes, agreed. It's definitely the plan of record to add a gc
canonical ABI option, just like you're describing. (It's one of the original motivations for having an IDL that abstracts low-level memory representation, even.) We've mostly been waiting for (1) wasm-gc to be finalized, which it now is and (2) an implementation of wasm-gc to show up in a runtime that also implements components (e.g., one is in progress in Wasmtime). But, if you or anyone else wants to run ahead and create a PR adding the gc
option to the proposal (Explainer.md, Binary.md and, mostly significantly, CanonicalABI.md), that would be welcome too.
Before ref-types, gc-types, stringref and other features are stable, we have enough time to discuss how the gc language should obtain wasi data.
In fact, after considering gc types, there is a better correspondence between the wasi type and the wasm type.
No options indicate pointer mode, add reference-type(tentative)
to indicate conversion to immutable reference, add mutable-reference(tentative)
to indicate internal mutable reference.
Upper Type | Lower Type | Canonical Options | Requisite |
---|---|---|---|
u32 |
i32 |
||
tuple<u32, u32> |
(i32, i32) |
||
tuple<u32, u32> |
(struct (field i32) (field i32)) |
reference-type |
gc |
tuple<u32, u32> |
(struct (field mut i32) (field mut i32)) |
mutable-reference |
gc |
record {a: u32, b: u32} |
(flatten layout) (i32, i32) |
||
record {a: u32, b: u32} |
(struct (field $a i32) (field $b i32)) |
reference-type |
gc |
list<u8> |
(i32, i32) |
||
list<u8> |
(array u8) |
reference-type |
gc |
list<u8> |
(array mut u8) |
mutable-reference |
gc |
string |
(i32, i32) |
||
string |
stringref |
reference-type |
gc, stringref |
string |
(string.encode_utf8 stringref) |
reference-type + string-encoding=utf8 |
gc, stringref |
borrow<string> |
string_view |
reference-type |
gc, stringref |
resource |
i32 |
||
resource |
externref |
reference-type |
ref-types |
flags |
(flatten layout) (i32 × ⌈flags / 32⌉) |
||
enum |
i32 |
||
option<u32> |
(ref null i32) / i31ref |
reference-type |
gc |
option<t> |
(ref null T) |
reference-type |
gc |
result<t, e> |
? | ? | ? |
variant |
? | ? | ? |
variant
may be similar to subtype with downcast in gc context.
Another benefit is that if all gc types are used, there is no need to bring in a memory allocator, which helps reduce the size and warm up faster.
rustc's cabi_export_realloc
takes about 27000 lines of wasm instructions(release mode), libc is even larger.
Other smaller allocators sacrifice either speed or security.
(component
;; Define a memory allocator
(core module $MockMemory ;; Replace here by an actual allocator module, such as libc
(func $realloc (export "realloc") (param i32 i32 i32 i32) (result i32)
(i32.const 0)
)
(memory $memory (export "memory") 255)
)
(core instance $mock_memory (instantiate $MockMemory))
;; import wasi function
(import "wasi:random/random@0.2.0" (instance $wasi:random/random@0.2.0
(export "get-random-bytes" (func (param "length" u64) (result (list u8))))
))
;; wasi function to wasm function
(core func $wasi:random/random@0.2.0/get-random-bytes (canon lower
(func $wasi:random/random@0.2.0 "get-random-bytes")
(memory $mock_memory "memory")
(realloc (func $mock_memory "realloc"))
))
;; import wasm function
(core module $TestRandom
(type (func (param i64 i32)))
(import "wasi:random/random@0.2.0" "get-random-bytes" (func $wasi:random/random@0.2.0/get-random-bytes (type 0)))
)
;; instantiate wasm module with wasi instance
(core instance $test_random (instantiate $TestRandom
(with "wasi:random/random@0.2.0" (instance (export "get-random-bytes" (func $wasi:random/random@0.2.0/get-random-bytes))))
))
)
If using the gc type, this can be simplified to:
(component
;; import wasi function
(import "wasi:random/random@0.2.0" (instance $wasi:random/random@0.2.0
(export "get-random-bytes" (func (param "length" u64) (result (list u8))))
))
;; wasi function to wasm function
(core func $wasi:random/random@0.2.0/get-random-bytes (canon lower
(func $wasi:random/random@0.2.0 "get-random-bytes")
reference-type
))
;; import wasm function
(core module $TestRandom
(type (func (param i64) (result (array u8))))
(import "wasi:random/random@0.2.0" "get-random-bytes" (func $wasi:random/random@0.2.0/get-random-bytes (type 0)))
)
;; instantiate wasm module with wasi instance
(core instance $test_random (instantiate $TestRandom
(with "wasi:random/random@0.2.0" (instance (export "get-random-bytes" (func $wasi:random/random@0.2.0/get-random-bytes))))
))
)
Obtaining a field of gc type requires only one instruction and does not require pointer algebra (at least three instructions), further reducing the binary size.
Yes, really good point regarding mutability vs. immutability; we probably do want both as ABI options. A really nice benefit of immutability is that if both sides of a component-to-component call use immutable GC references, no copy needs to be made when passing a reference across the boundary. OTOH, if your language ultimately does need a mutable array of bytes, then the immutable GC option may impose an extra unnecessary copy; thus having both options make sense.
String its its own story, but definitely a Unicode-encoded (array u8)
makes sense (if we treat string-encoding
as orthogonal, then all three of utf8
, utf16
and latin1+utf16
could be encoded into this array of u8
/u16
). Based on the last CG meeting, stringref
is either not going to happen or not any time soon. However, we could add something stringref
-y at the Component Model level in which we lower string
values to a reference type (externref
initially, later we could eliminate dynamic type checks with type imports) and supply canonical built-ins for operating on these strings (being quite careful to support only basic operations that have the same O(1)/O(n) cost on all host string representations such as sequential code-point iteration or bulk-copy-into-linear-memory and are trivial to implement w/o giant Unicode tables). But (array u8)
is probably the right place to start.
Considering the complexity of mutable and some incoming features such as partially mutable
, readonly
and freeze
, it may need to exist as a reference-type parameter.
Taking into account proposals such as thread and share-everything-threading, you can consider implementing this feature in stages.
The initial version only provided immutable types that did not require copying.
Mutability is a post-MVP content, before which users need to sacrifice certain performance to manually implement some glue code to copy to the required types.
I'm having some trouble switching to wasi preview 2.
For example, the following interface:
The function signature is
func (u64) -> (list<u8>)
But its lower type is
core func (i64, i32) -> ()
, which is very difficult to use.If I want to convert it to
core type (array (mut u8))
, a very long glue code is required.I hope to add a GC mode canon option that can make the lower type similar to
core func (i64) -> (array u8)
.For complex nested types, getting the specified data requires very complex pointer algebra, whereas if using array it only requires multiple
array.get
.I think this helps simplify the use of some external interfaces, such as: