WebAssembly / stringref

Other
37 stars 2 forks source link

Stringref should probably be a subtype of anyref #40

Open wingo opened 2 years ago

wingo commented 2 years ago

For languages with a universal value representation, it would be nice to be able to have a list of "any", then pull out the individual values and do some type dispatch on those values. Strings are a fundamental kind of value, so they should support this idiom. The whole type hierarchy is in flux over at the GC proposal but at the minimum we should support the ref.as_string, br_on_string, ref.is_string set of instructions, in whatever form those instructions end up landing (https://github.com/WebAssembly/gc/issues/274).

On the other hand we really really want to avoid having to do this for stringview_wtf8, stringview_wtf16, and stringview_iter. We expect that on a run-time implementation that represents strings as WTF-8, that a stringview_wtf8 will just be the string itself, and likewise for WTF-16 systems and stringview_wtf16. You wouldn't be able to dynamically dispatch on the view to know its type, because stringref shares a representation. Not sure exactly how to make this happen on the spec level but the current spec has this property right now and we should preserve it :)

Related to #3.

rossberg commented 2 years ago

the current spec has this property right now and we should preserve it

I may misunderstand what exact property you mean. But the spec does in no way guarantee (or intend to guarantee) that every value under anyref can be classified. For starters, you can't recognise host references. There is nothing fundamentally wrong with introducing Wasm references that you cannot classify dynamically either, yet they are anyref. In fact, that's highly preferable to introducing additional top types, because each of those would form its own hierarchy and would have to come with a new bottom type as well.

wingo commented 2 years ago

My imprecision stems from my ignorance :) Backing up a bit, I think you would want stringref values to be classifiable (if I understand your term correctly). But you also might want stringview_wtf16 to have the same representation as strings, in a browser. In my mind the solution here is that a stringref can be held in an anyref-typed location but that a stringview_wtf16 cannot. Therefore if you see a v8::String value, you know it's a stringref and not a stringview_wtf16. Is this a reasonable thing to want, @rossberg ?

rossberg commented 2 years ago

Okay, I see, that's a problem. Technically, I agree your suggestion would be a solution. But one that adds significant complexity to the type system. Can't say that I'd get excited about that.

An alternative would be not to treat views as reference types but as a new, third category of value type that's neither numeric nor reference. Not sure that's simpler, though, it's probably even worse.

Personally, I would rather avoid them sharing a representation. Not least because that results in an implementation-dependent cost model. But I fear that's the case for views already?

wingo commented 2 years ago

To a degree, I think the question of implementation-dependent cost models is just a thing we have to deal with, for better or for worse. In an implementation using encoding X internally, it will be cheaper (and indeed possibly free) to obtain a view on a string's contents for encoding X than encoding Y. I think it's a fundamental aspect of this particular local maximum in the design space.

I am sympathetic to the type system complexity question, of course.

gkdn commented 2 years ago

Just per J2WASM experience, stringref being subtype of anyref doesn't really help us since in our type system we cannot use any as the top type; as the top type needs to have properties like toString, equals and hashCode. As a result we have a wrapper types for things like Strings and Arrays.

This is in contrast to our modeling with J2CL where we backed things with JS types without wrappers where applicable. This was critical for our jsinterop story which was main driver for the compiler. This resulted in having trampolines on these top level methods to handle various mapped JS builtins however that is unlikely something we can adapt in the J2WASM case.

tlively commented 7 months ago

Some experimentation with V8 shows that in the current implementation, stringview types are their own top types rather than being subtypes of any, but that they are also all supertypes of none, so they're not quite in their own hierarchy.

For the time being I've changed Binaryen's implementation to match this (https://github.com/WebAssembly/binaryen/pull/6440), but if this proposal ever gets revived, it would be good to properly separate the stringviews from the any hierarchy by giving them their own bottom types.

The current V8 implementation also disallows casts to stringview types, but this is inconsistent with the final WasmGC spec, where any reference type can be the target of a cast. It would be better to be consistent and allow casts to stringview types; if they're properly separated into their own type hierarchies, they would still be implementable as unmodified string.

sjrd commented 7 months ago

I'm discovering now that it seems there is no way to store a stringref in an anyref. If true, for our work-in-progress implementation of Scala-to-Wasm, that would be a total blocker. It doesn't have to be a proper subtype, but at least it should have O(1) conversion operations like any.convert_extern and extern.convert_any. And for us to be able to use stringref at all, we would need such values to be seen by the JS embedding as JS strings (like i31refs are guaranteed by spec to be seen as JS numbers in the appropriate range, and vice versa).

We indeed have a universal representation of types. Unlike what was said about J2WASM above, in Scala-to-Wasm we do not compromise on our JS interop story, even when compiling to Wasm. That means our universal representation must be able to store values in a way that, when crossing the JS embedding, map to the corresponding JS types. The externref/anyref equivalence guaranteed by the JS embedding for GC is a critical property for us. Can we get something similar for stringref/anyref?

(We are not yet attempting to use the stringref proposal; currently we use actual JS strings, but I very much hope to be able to use stringref in the future.)

tlively commented 7 months ago

@sjrd, in case you didn't know, this proposal is essentially on hold in favor of JS string builtins, so you should focus on that proposal instead.

sjrd commented 7 months ago

Oh thanks for the info. I'm also following the JS string builtins proposal, but I didn't know that it was likely to supersede stringref.

wingo commented 7 months ago

FWIW the workaround for stringref <-> anyref, such as it is, is to allocate a (struct (ref string)) wrapper. Not ideal!

sjrd commented 7 months ago

FWIW the workaround for stringref <-> anyref, such as it is, is to allocate a (struct (ref string)) wrapper. Not ideal!

That would not work for us, because when that (struct (ref string)) is given to JS through the JS embedding, JS will see an opaque Wasm object, rather than a JS string.

That's why, for example, we do not wrap our f64s into (struct (f64)). Instead we go through a JS function

function boxDouble(x) {
  return x;
}

that we import in Wasm as a f64 -> anyref. This way, we can put in our universal anyref representation something that, if given to JS, is actually a number.

Liedtke commented 7 months ago

The current V8 implementation assumes that stringref is a subtype of anyref. (It seems to have been introduced in this PR.)

tlively commented 7 months ago

Right, but not the stringview* types.