WebAssembly / gc

Branch of the spec repo scoped to discussion of GC integration in WebAssembly
https://webassembly.github.io/gc/
Other
1k stars 72 forks source link

Immutable arrays? #250

Closed kripken closed 2 years ago

kripken commented 3 years ago

The spec does not allow immutable arrays atm. When optimizing things for j2wasm we noticed that it seems like we could support them, if we have array.init which takes the values to initialize with, which is also being experimented with.

It seems like the rules for an immutable array could be like those of a struct, that is, subtypes would need to match it precisely on everything? Structs can at least add more fields, but not arrays, so this makes subtyping less useful in the array case, of course. Still, immutable arrays can be useful because of things like itables: An itable call does a read from an array that is in practice immutable, so if the toolchain and VM know it is immutable that can help.

Opening this issue for thoughts as we look into experimenting with it.

RossTate commented 3 years ago

Making itables immutable is also important for enabling better speculative optimization. (I'll give some more context in another post.)

array.init only handles the case where the array's size is known at generation time, but that's probably sufficient for supporting at least the immutable arrays used in runtimes.

Note, though, that i-tables will be a particularly inefficient implementation of interface-method dispatch in WebAssembly. Besides the i-table being immutable, normally the i-table is flattened directly into the object descriptor (or v-table), as are the method tables for each of the interfaces in the i-table. That locality has a big impact on performance. And then there's the fact that, in WebAssembly, you'll have to cast the method table for the interface once you find it. Personally, I'd recommend interface-method tables (using call tags) over i-tables, which we've recently observed to implement interface-method dispatch just as efficiently as class-method dispatch, and which can probably be much more easily implemented than providing generators a way to specify flattened data structures.

jakobkummerow commented 3 years ago

I support this.

I think you have a mixup in the subtyping description: for mutable arrays, subtypes have to match the field type exactly. For immutable arrays, subtypes of the array can specify a non-trivial (non-identical) subtype for the field type. The mutability attribute itself must be identical in a subtype and its supertype. The summary is what you said: arrays would then behave for subtyping like structs with a single field.

kripken commented 3 years ago

@jakobkummerow Oh right, yes, I had that part wrong.

rossberg commented 3 years ago

To clarify, subtyping on immutable arrays is already allowed. The following module validates just fine in the current MVP and the interpreter:

  (module
    (type $t (struct (field i32)))
    (type $u (struct (field i32 i32)))
    (type $at (array (ref $t)))
    (type $au (array (ref $u)))
    (func $f (param (ref $at)))
    (func $g (param (ref $au)) (call $f (local.get 0)))
  )

In fact, I just noticed that the interpreter even allows allocating immutable arrays, in contrast to what the MVP doc says. This runs fine:

      (call $f
        (array.new $au
          (struct.new $u (i32.const 1) (i32.const 2) (rtt.canon $u))
          (i32.const 100)
          (rtt.canon $au)
        )
      )

However, that's not particularly useful, which is why I changed it in the doc. I should adjust the interpreter.

What's missing for immutable arrays to be useful is the ability to allocate an immutable array with different elements. Technically, we could of course add an instruction like

array.new_with_multi $a N : [t^N (rtt n $a)] -> [(ref $a)]
  where $a = (array (mut t))

However, obviously N would have to be a producer-time constant. That makes it mostly redundant, because in most cases where N is known, the producer could just as well use a struct type instead, which is cheaper to access as well. The remaining cases are probably fairly niche. (Turns out I happen to have a use case in the Wob compiler, but it's somewhat benign.)

My main motivation for not adding this was that most practical cases of initialising arrays need something more general anyway, and can really only be handled with readonly fields. So if we think (quasi-)immutable arrays are important for the MVP, then we should perhaps rather consider up-prioritising readonly, which wouldn't be difficult (and has other benefits).

(Another reason I'm hesitant about an array.new_with_multi instruction is the concern that it may encourage the use of N = 1000 or worse in real code, which might turn out bad in terms of validation cost, and possibly other compilation phases, compared to straightline assignments. We could of course impose an arbitrarily low limit on N, but that would feel ad-hoc as well.)

manoskouk commented 3 years ago

The instruction you are describing is pretty much the same as array.init that we are experimenting with in V8. The only difference is that we do not allow an additional length argument. I think its use is not that niche, as you can define constants for types which are generally represented as arrays of arbitrary length, e.g. strings. We have set the limit of N to 999, which, as you suggested, is arbitrary. To circumvent this limitation for string constants, we are discussing loading elements directly from raw bytes as opposed to a series of i32.const, through some sort of data segment, or maybe immediate arguments.

jakobkummerow commented 3 years ago

subtyping on immutable arrays is already allowed

That statement appears to be a somewhat misleading technicality given that MVP.md clearly states:

Note: in the MVP, all arrays must be defined as mutable


Regarding the limit for N: we've reused the same value we're currently using for the number of fields in a struct. We expect the specific value to be standardized eventually; the current value is just an initial guess. In practice, the array.init occurrences I've seen so far have a handful of elements, but I haven't run any thorough analysis.

rossberg commented 3 years ago

@manoskouk, ah, the i32 operand was c&p, fixed.

I agree that the ability to initialise an array from a data segment is missing. I have run into that myself, and having to go through an aux memory for string constants is kind of stupid. Especially for the string use case, that might actually be the more appropriate solution than array.new_multi.

The main functionality would be something like:

array.init_data $a (data $d) : [(ref null $a) i32 i32 i32] -> []
array.init_elem $a (elem $e) : [(ref null $a) i32 i32 i32] -> []

analogous to memory.init and table.init, where $d and $e are a data or element segment index, respectively. The former is usable for arrays over number types, the latter for arrays over reference types.

On top of that, we could have

array.new_data $a (data $d) : [i32 i32 i32 (rtt $a)] -> [(ref $a)]
array.new_elem $a (elem $e) : [i32 i32 i32 (rtt $a)] -> [(ref $a)]

But it is a slippery slope to adding ever more variants of array.new, e.g., to initialise from memories, tables or another array. All these variants are unnecessary in the presence of readonly.

OTOH, I could imagine

array.copy $a1 $a2 : [(ref null $a1) i32 (ref null $a2) i32 i32] -> []
array.copy_from_memory $a (memory $m) : [(ref null $a) i32 i32 i32] -> []
array.copy_from_table $a (table $t) : [(ref null $a) i32 i32 i32] -> []
array.copy_to_memory $a (memory $m) : [(ref null $a) i32 i32 i32] -> []
array.copy_to_table $a (table $t) : [(ref null $a) i32 i32 i32] -> []

as operations analogous to memory.copy and table.copy.

@jakobkummerow:

That statement appears to be a somewhat misleading technicality given that > MVP.md clearly states:

Note: in the MVP, all arrays must be defined as mutable

Fair enough, that's rather ambiguous. I believe it was meant to talk about arrays, not array types per se, otherwise there would have been an explicit condition about <fieldtype>. But then it should better say "created" not "defined". That said, I'd be fine to resolve it either way.

kripken commented 3 years ago

@rossberg Oh, thanks, I had read the statement @jakobkummerow quoted and also understood it to mean this was disallowed. Good to hear that this is already ok then.

tlively commented 2 years ago

Note: in the MVP, all arrays must be defined as mutable

... I believe it was meant to talk about arrays, not array types per se, otherwise there would have been an explicit condition about <fieldtype>. But then it should better say "created" not "defined". That said, I'd be fine to resolve it either way.

@rossberg, I still don't understand what this note would mean if it read "all arrays must be created as mutable." It would be great to clarify this language in the doc; I think that's the only thing preventing us from closing this issue.

jakobkummerow commented 2 years ago

Now that we have array.new_fixed and array.new_data, we should allow immutable arrays. We all agreed that the only reason not to have them was that there was no way to create them (with non-default values), but that's no longer the case.

IIRC they are useful in particular for itable implementations, where the immutability helps toolchains/engines apply optimizations/inferences.

rossberg commented 2 years ago

What @jakobkummerow said. The note was obsolete and I just forgot to remove it. Done now.

tlively commented 2 years ago

Great, I'll close this issue, then.