Importing type customization information

eqrion commented 8 months ago

Here's a sketch proposal that is very similar to type imports, but instead of importing the whole type we are only importing 'host-info' that only defines how the type is reflected in the host.

Goals

Be able to set the prototype of structs and arrays
Be able to attach read or write accessors to structs and arrays

Sketch

Add a new module-field that represents how a type should be reflected when accessed by host code. I don't have a good name yet, so I'll offer a bad one: host-info. This is only able to be imported and exported, no way to define locally is provided. The definition is basically opaque to wasm.
Allow a type definition to specify an (imported) host-info.
Extend JS-API with StructHostInfo and ArrayHostInfo interfaces that allow specifying prototypes, accessors, etc.
Instantiating a module will check that the imported host-info is consistent with the type that imports it.
Type equality remains structural, but the final imported host-info is taken into account. Two instantiated types are equivalent iff their host-info's are structurally equivalent. Hosts are allowed to define equality of their host-infos however makes most sense for them. [1]

[1] This is important for web engines where shapes/maps are used for casting and also storing prototype/accessor information. Different prototypes means different shapes/maps, and therefore it's very useful for the types to no longer be equivalent for casting.

Example

(module
  (import "types" "Pair" (host-info $pair-info))
  (import "types" "Bytes" (host-info $bytes-info))

  (type (host-info $pair-info)
    (struct (field i32) (field f32))
  )
  (type (host-info $bytes-info)
    (array i8)
  )
)

let imports = {
  types: {
    Pair: new WebAssembly.StructHostInfo({
      accessorNames: ["first", "second"],
      prototype: PairProto,
    }),
    Bytes: new WebAssembly.ArrayHostInfo({
      allowIndexing: true,
      prototype: BytesProto,
    }),
  }
};
let instance = WebAssembly.instantiate(source, imports);

tlively commented 8 months ago

I like this idea. Making the host info a syntactic element of the type definition nicely solves the problem of having different reflections of the same type.

One issue is that if host info imports work the same as other imports, then it will be impossible to determine until instantiation time whether two imported host infos are the same and therefore whether two types that differ only in their host info should be the same. I guess we would have to solve that by treating different host info indices as different at validation and compile time even if they end up the same. But then casts would have to respect that difference as well, so casts couldn't just use the imported host info as the RTT 🤔

eqrion commented 8 months ago

One issue is that if host info imports work the same as other imports, then it will be impossible to determine until instantiation time whether two imported host infos are the same and therefore whether two types that differ only in their host info should be the same. I guess we would have to solve that by treating different host info indices as different at validation and compile time even if they end up the same. But then casts would have to respect that difference as well, so casts couldn't just use the imported host info as the RTT 🤔

Just to make sure we're on the same page, we're thinking of a situation like:

(module
  (import "" "a" (host-info $ai))
  (import "" "b" (host-info $bi))
  (type (struct $a (host-info $ai)))
  (type (struct $b (host-info $bi)))
)

$a and $b are identical except for referring to different imported host-info's.

I believe the situation is very similar to how I understand full type imports would need to work.

Validation will have to assume that different host-info imports are different, because they may be instantiated differently. But instantiation may provide the same host-info for the types, in which cases the types will be equivalent. Any runtime types/casting will be a 1:1 reflection of that.

So in the example above, a struct.new $a will ref.test $b == 1 iff if the host-info $ai == $bi at instantiation time.

tlively commented 8 months ago

It seems odd that the relation between types at runtime could be different than the relation between them at validation time, but I guess it's probably ok because the validation relation is more conservative. And yeah, I guess full type imports would have to work similarly.

@rossberg, I'm curious to hear what you think of this idea.

rossberg commented 8 months ago

This sketch sounds like the notion of "descriptors" that we discussed two years ago, as a way for giving access to what engines do with shapes/maps, including the ability to store "static" fields in them. Shame on me for having had a sketch of that lying around for a long time, but never getting round to thinking through it enough to put it in the Post-MVP doc. The idea roughly is that you can declare

(type $t (struct (desc $d) (field ...)))

where $d is another type you can either define locally or import. That type has to be declared to be a descriptor, e.g.,

(type $d (struct descriptor (field ...)))

If defined within Wasm, this gives a way to store Wasm fields within the descriptor, like JS engines do to store some constant fields.

Where I got a bit stuck with this idea is that as just sketched naively it would be unsound – like I believe your sketch would be as well. Since we want to use descriptors as RTTs for casts, we also must ensure that they are used consistently, each for one specific type only, across modules! Otherwise nothing is preventing anybody from using one descriptor for two totally unrelated types (possibly in different modules), and boom. Preventing that would seem to require that every descriptor also declares which exact type it is a descriptor for(*), creating a cyclic dependency between the two. Maybe that's okay, but it seems a bit unwieldy. And a follow-up complexity then is that descriptor imports also need to specify this, i.e., imports would need non-trivial bounds.

(*) Fixing its entire supertype hierarchy as well, since unlike JS, Wasm also has subtyping, which is encoded in the descriptor.

@tlively, re your question: type imports act like parameters. As @eqrion says, validation has to assume conservatively that they are different, even though they could be instantiated the same. That is perfectly fine and unsurprising. The same happens in a Java-like language with generic functions containing casts. (Edit: I should have said C#-like instead, since generic casts are of course broken in Java.)

eqrion commented 8 months ago

@rossberg I think the difference here is that the imported 'host-info' here is an addition to the type definition and runtime-type, not a replacement for it.

So if you just had a 'host-info', you can't do any casting of objects with it. You'd need to have a 'type' and 'host-info' pair to get the runtime type to do any cast. So two different types with the same host-info would still be two different types. It's just extra metadata on the type for things like field names or prototypes, and is largely opaque to wasm.

But on the (off-)topic of using the RTT for extra storage for static fields, I also have a sketch for that where:

(1) A type could declare a single field that would exist on their RTT:

(type $class (struct ...))
(type $object (type-field (ref $class)) (struct ...))

(2) The rtt type would be parameterized based off of an optional field type (e.g. rtt (field $class)) (3) The instruction rtt.canon $type would take the field as an input if $type has one

We would restrict all rtt field types to be subtypes of eqref and return equivalent rtt's for equivalent field values (based on object identity).
VM's would keep a weak hash map of (type, field) -> rtt that is used for this (Shapes in SM have this already) (4) The new rtt.get_field instruction would load the field from a given rtt (5) The new rtt.from_type instruction would load an rtt from a given struct/array

This would let programs store their class objects (or whatever) in engine's rtt's and access them from a given object by getting the rtt, then the field off of it.

I'm not sure if the bit in rtt.canon about equivalent field values is necessary, but it seems like it could be useful for module rtt coordination.

WebAssembly / gc-js-customization