`extern.internalize` differences a compat hazard?

WebAssembly / gc

Branch of the spec repo scoped to discussion of GC integration in WebAssembly

https://webassembly.github.io/gc/

Other

990 stars 71 forks source link

`extern.internalize` differences a compat hazard? #369

Closed tlively closed 1 year ago

tlively commented 1 year ago

In the JS embedding, extern.internalize returns values dynamically typed as anyref or i31ref. This suggests that other embedders would be free to return values with any other subtype of anyref as well. For example, an embedder might choose to internalize values into concrete struct or array types rather than opaque values. Portable modules must not assume internalized references are anything more specific than anyref, although it would be possible to write non-portable modules that do make assumptions, for example by always casting the result of extern.internalize to i31ref or some other type.

Are we concerned about different embeddings having different extern.internalize behavior being a compatibility hazard?

rossberg commented 1 year ago

It is already the case that extern.internalize can produce arbitrary subtypes of anyref, because the operand may be the result of extern.externalize applied to a Wasm value. Moreover, nothing in the core spec can prevent a host from producing equivalent values on its own, that did not originate from Wasm. Consequently, i31ref does not add anything new in that regard – when Wasm code is downcasting the result of internalize of a value that came from the host, then it is fundamentally depending on embedder-specific behaviour.

titzer commented 1 year ago

I agree that internalizing host values is inherently going to be host-dependent behavior. What matters is the internalization of externalized Wasm values. Perhaps we should specify that internalize(externalize(val)) == val?

rossberg commented 1 year ago

@titzer, that will follow from the semantics. (Both operations are merely representation changes between two isomorphic types, semantically they are no-ops. Consequently, both internalize ∘ externalize and externalize ∘ internalize are the identity.)

titzer commented 1 year ago

Hmm, I don't quite see how the Wasm specification will achieve that it if it specifies neither the representation of host values nor the mapping of Wasm values to host values. Probably splitting hairs here, but I don't see how the specification can say anything more precise than the internalize ∘ externalize is identity, since it seems host value equivalence can't be specified by Wasm either.

tlively commented 1 year ago

Good point that we already get this situation from internalizing externalized values. It still seems possible for there to exist a compatibility hazard, but the solution would be to standardize the internalized types of values above the layer of the Wasm core spec, just like the JS embedding spec does. For example, if a WASI API returns an externref, it should probably also specify the precise internalized type of that externref as well. That doesn't play well with virtualization, though, so it would actually have to specify only an upper bound on the internalized type, and then the compat hazard becomes possible again.

rossberg commented 1 year ago

@titzer, as far as the core spec is concerned, the universe of references under type anyref is extended with an internal ref.extern a, where a is some abstract host address. So yes, it does not know anything about proper host references.

But the universe of references under type externref will be exactly the same. That is, it also includes abstract proper host references, as well as externalised Wasm references, which are distinguished and not abstract. That way, we can trivially specify the bijection between both types that is observed by in/externalize (which we need to be able to specify).

The rest then is up to each API spec, which has to define which sort of Wasm reference its own values are mapped to at the boundary. For JS, some JS values are mapped to Wasm references (small ints, Wasm exotic objects, both of which Wasm can observe) while most others are mapped to proper extern references.

tlively commented 1 year ago

Is there any more to say about the potential compat hazard here, or can this be closed?

tlively commented 1 year ago

Closing, feel free to reopen for further discussion.