Introducing `struct.get_indirect` and `struct.set_indirect` opcode

WebAssembly / gc

Branch of the spec repo scoped to discussion of GC integration in WebAssembly

https://webassembly.github.io/gc/

Other

991 stars 71 forks source link

Introducing `struct.get_indirect` and `struct.set_indirect` opcode #397

Open xujuntwt95329 opened 1 year ago

xujuntwt95329 commented 1 year ago

Purpose of this issue

Hi, in GC-05-16 meeting we have introduced our idea of adding struct.get_indirect and struct.set_indirect opcode to support language semantics like TypeScript's interface in WasmGC, we had some discussion during the meeting and PR for the meeting notes, and this issue is to continue the discussion.

Background

WasmGC opened the door for supporting high level managed language more efficiently than compiling the whole language runtime to WebAssembly, we are interested in exploring a solution to compile TypeScript to WasmGC. Our strategy is:

For those with explicit type information (like primitives, classes), we apply static compilation, and use wasm value types (i32, i64 ...) and WasmGC types to represent them
For those with any type, they are represented as an externref, and we introduced a libdyntype to support dynamic accessing and type checking on them, operation on these objects will be converted to API call to libdyntype

This provide the flexibility to the developer:

If performance matters, write more static type information to get a statically compiled module
If dynamic type is needed, any is also supported but they will pay for the performance influence

Problem

Most types work well based on the strategy above, but we encountered problem with interface. interface does not describe the layout of a concrete type, its just an un-ordered collection of field names and types. This means the actual object type holding by the interface is not known at compile time, so we have two solutions:

treat interface as any (interface is heavily used in TypeScript, if we treat is as any, the performance impact may be huge)
record some meta data of every static object type in wasm module, and calculate the field index according to the meta data and field name (preferred)

The option 2 is preferred because we can still represent objects using WasmGC types, and we can do further optimization to avoid searching the meta info in some scenarios (e.g. check a pre-assigned type-id). Option 2 needs to calculate the field index during runtime, but currently the struct.get/set opcode require the field index to be immediate.

Proposed Opcode

To accept field index calculated during runtime, we proposed the indirect version of struct access opcode:

name	immediates	stack signature
struct.get_indirect\<_sx>?	\<ti>	[anyref, i32] -> [ti]
struct.set_indirect\<_sx>?	\<ti>	[anyref, i32, ti] -> []

This require some runtime type checking:

Trap if ref is nul
Trap if ref is not a struct
Trap if index is invalid in the struct
Trap if type of the accessing field is not ti

Then we can use this opcode to access the field of interface:

Influence to runtime implementation

performance: The proposed opcode will be slow due to several runtime checking, but this will not influence the performance of any other opcodes
memory usage: To apply runtime checking, it is required that every GC object have a reference to its defined type. Since we already have RTT now, the only thing we need to add is a type index in RTT object, so the impact on memory should be low

Is there any workaround without the proposed opcode?

Treat interface as any as mentioned option 1 above
- The performance will be slow
Compile whole language runtime into WebAssembly, and execute the source code in that "vm inside vm"
- Can't use WasmGC's capabilities, need to compile interpreter, memory management logic all into wasm opcode
- Large wasm module size

xujuntwt95329 commented 1 year ago

Continued discussion

As @rossberg wrote in https://github.com/WebAssembly/meetings/pull/1279#discussion_r1201812009:

@xujuntwt95329, since these operations have to inspect and dispatch on (the structure of) the runtime type of the object, they represent a form of type reflection. They would be the Wasm equivalent to struct.getClass().getField(x).get/set(struct), the only difference being that x is numeric, not a string, because fields are accessed by number in Wasm. This idea further assumes that every heap object carries sufficient runtime information to perform such reflection, which is not an assumption I'd like to build into Wasm itself.

@rossberg Thanks for the explain, I agree with you that the idea of struct.get_indirect is a kind of reflection (or more specifically, index based reflection😀).

If this is not a good idea in WebAssembly, do you have any suggestions about how to support the TypeScript's interface semantic based on WasmGC? I know we can implement a runtime inside wasm and do such dispatch in "user space", but that will require all the objects to be allocated on linear memory rather than WasmGC heap, but that's not what we want. We do want to leverage WasmGC to support these languages.

tlively commented 1 year ago

Another way to implement this reflection in user space would be to generate a function for each struct type that takes an index and uses a br_table to use the index to determine which field to load. IIUC, this is not too far off from how an engine would implement this internally, anyway.

xujuntwt95329 commented 1 year ago

Another way to implement this reflection in user space would be to generate a function for each struct type that takes an index and uses a br_table to use the index to determine which field to load. IIUC, this is not too far off from how an engine would implement this internally, anyway.

Thanks for the suggestion. Yes, this is a feasible approach, but seems this will greatly increase the footprint of the generated wasm module. So in my opinion, compared to this user space approach, the proposed struct.get_indirect opcode can help to generate more compact wasm modules

kripken commented 1 year ago

@tlively I think that idea can be extended to consider the index static. That is, when you get a property on an interface, you know which property at every use:

interface Rectangle {
  width: number;
  height: number;
}

// Use the interface Rectangle's property "width".
foo(rectangle.width)

That last line's read of .width can turn into an itable call of a method get_width(). So each class implementing the interface would fill in an itable with one method per property (or two, one to read and one to write).

That still means doing an itable call (read itable object, do indirect call) on each access, which is slow, but if the VM does runtime inlining it should be fast enough in the monomorphic case.

(In the polymorphic case I think linear memory can do much better with a table of offsets. That would avoid any indirect call or even call entirely.)

rossberg commented 1 year ago

@xujuntwt95329, in general, the idea is that Wasm essentially is a virtual CPU, not a high-level VM. So the two questions to ask are:

How would this feature be compiled if the target was a real CPU?
How is equivalent code expressible in Wasm's somewhat more constrained instruction set?

In other words, you generally shouldn't expect any more support from Wasm than you get from a native CPU. Certain exceptions apply where we make something more complex into a Wasm feature, but the bar for that is very high, and usually has to do with working around unavoidable restrictions of Wasm.

xujuntwt95329 commented 1 year ago

@rossberg, for your two questions:

How would this feature be compiled if the target was a real CPU?

On a real CPU, we definitely need to implement a runtime to support such feature.

How is equivalent code expressible in Wasm's somewhat more constrained instruction set?

As you mentioned, the wasm essentially is a virtual CPU, so in the current MVP wasm instruction set, we need to implement a runtime and compile to wasm opcode to support such feature.

However, I have some different opinion about the idea is that Wasm essentially is a virtual CPU, not a high-level VM: I agree that the opcode defined in MVP core spec can be treated as virtual CPU instructions, and they may even be implemented in HDL to become a real hardware CPU; but the opcode introduced in WasmGC proposal seems already provide some high level semantics, I believe no real CPU can support such semantics.

xujuntwt95329 commented 1 year ago

@tlively I think that idea can be extended to consider the index static. That is, when you get a property on an interface, you know which property at every use:
interface Rectangle {
  width: number;
  height: number;
}

// Use the interface Rectangle's property "width".
foo(rectangle.width)
That last line's read of .width can turn into an itable call of a method get_width(). So each class implementing the interface would fill in an itable with one method per property (or two, one to read and one to write).

That still means doing an itable call (read itable object, do indirect call) on each access, which is slow, but if the VM does runtime inlining it should be fast enough in the monomorphic case.

(In the polymorphic case I think linear memory can do much better with a table of offsets. That would avoid any indirect call or even call entirely.)

Yes, this can also support the interface feature, but in TypeScript, interface is implicitly implemented, the compiler can't rely on the implements keyword to decide the implement relationship, instead it should treat all classes implemented all interfaces that has fewer fields than them, so this will also make the generated wasm module become very large

rossberg commented 1 year ago

@xujuntwt95329, I've used the slogan "As low-level as possible, but no lower" to describe Wasm. As I said, there are exceptions where we have to raise the abstraction level somewhat, and GC is one of them. But that is no free pass to arbitrary high-level operations. Even with GC, we have been very careful to (almost) keep it as low-level as we can. That is why there are only plain tuples and arrays, no object system, no classes, no methods, no reflection, no hidden allocations, etc. I have my concerns about how we failed on that front wrt to casts, but other than that, we managed to stay clear of complex operations.

That doesn't mean you can't use Wasm GC. It only means that you still have to build a language-specific runtime and custom data structures on top of it. Wasm takes care of GC for you, but nothing else. GC types merely describe the layout of your runtime's data structures to the engine.

xujuntwt95329 commented 1 year ago

@rossberg I understand that we should try to avoid complex operations in current GC proposal, thanks for all your detailed explanations!

I understand that supporting this kind of reflection seems beyond the scope of the GC MVP proposal, but it do have some benefit in the engineering (e.g. help to reduce the module size).

So do you think it could be considered in the Post-MVP? And if so, will it be meaningful if we can provide some data about the performance, memory usage, benefit and reference implementation of such opcode?

kripken commented 1 year ago

@xujuntwt95329

in TypeScript, interface is implicitly implemented, the compiler can't rely on the implements keyword to decide the implement relationship, instead it should treat all classes implemented all interfaces that has fewer fields than them, so this will also make the generated wasm module become very large

Good point, yeah, if there are commonly-used names then we'd end up with many classes needing itables to support them. And maybe struct.get|set_indirect can be somewhat more compact overall. But even with it, you can have a large amount of metadata, I think? (the tables that give you the offsets for struct.get|set_indirect, and the VM's internal tables for those instructions). Also, maybe the larger question is how big that metadata is compared to function code and runtime data.

It might be interesting to measure these overheads. However, I agree with @rossberg that these proposed new opcodes are significantly higher-level than anything else in WasmGC so far, so there is a high bar, I think, for considering them.

But I do agree that supporting dynamic languages like TypeScript in wasm is a very important goal! My personal suggestions would be:

In the short term, I'd suggest trying to see how far we can get with static analysis to remove unneeded metadata and overhead. I'd be happy to work with you on the Binaryen side - we've worked closely with J2Wasm for Java and so Binaryen already has a lot of generic analyses to remove unneeded fields in itables, to devirtualize, etc., which should already help you, and we can explore more.
In the long term I think dynamic languages might benefit from JIT support in wasm. That's what state of the art VMs for those languages tend to do, for good reason. I think we'll hit a limit with pure AOT approaches for them.

rossberg commented 1 year ago

I agree with @kripken that a key feature to enable the implementation of high-level runtimes in Wasm would be a general mechanism for user-space jitting. That has been an assumption for long, but I don't think we have a particularly clear idea yet what it should look like.

xujuntwt95329 commented 1 year ago

@kripken Thanks for the suggestions!

Also, maybe the larger question is how big that metadata is compared to function code and runtime data.

Yeah, currently we use host APIs to emulate the struct.get|set_indirect opcode in WAMR, and we can use the user space dispatch approach on the browser, then we can compare these two approaches.

But I do agree that supporting dynamic languages like TypeScript in wasm is a very important goal!

Yeah, we start this TypeScript to wasm work because we think it can bring more developers to WebAssembly, and we think the current (quickjs, python)vm inside (wasm)vm solution is not so efficient. But we don't treat TypeScript as an absolutely dynamic language, we try to do static compilation as much as possible based on the type information in the source code, and support limited dynamic semantics through a series of APIs. (We see more and more dynamic languages start to have type annotation, e.g. python 3.5+, so we believe this “partial static compile” idea can benefit several dynamic languages)

In the short term, I'd suggest trying to see how far we can get with static analysis to remove unneeded metadata and overhead. I'd be happy to work with you on the Binaryen side - we've worked closely with J2Wasm for Java and so Binaryen already has a lot of generic analyses to remove unneeded fields in itables, to devirtualize, etc., which should already help you, and we can explore more.

That would be really helpful! Thanks for your help. Is there a link to the J2Wasm project or any documents about the analyses in binaryen as you mentioned?

xujuntwt95329 commented 1 year ago

@rossberg @kripken Yes, I agree that for the long term, user space Jitting should be a great solution for dynamic languages, but JIT also have some disadvantages:

It requires a more complex VM and maybe influence the stability or increase attack surface of the software
It may not be able to run on some resource constraint devices such as MCU, IoT devices due to limitation of flash or ram
It may not be acceptable in some realtime applications because the just in time compilation may influence deterministic

So even with JIT capability, maybe we should still explore more approaches to directly support some languages (like TypeScript) more efficiently.

kripken commented 1 year ago

@xujuntwt95329

Is there a link to the J2Wasm project or any documents about the analyses in binaryen as you mentioned?

For compiler writers to WasmGC, maybe the most useful links are:

Binaryen GC optimization cookbook
WasmGC lowering tips (wip)
There is also this older presentation PDF but it is outdated at this point.

Overall Binaryen's GC optimizations are fairly complete at this point, and include devirtualization, escape analysis, inlining, etc. (full list here), but there are probably many more things we could add. With Java, Kotlin, and Dart it's been useful for them to find cases that look like they should be optimized by Binaryen but are not, so @xujuntwt95329 if you find something like that, or if you have any other questions, let me know!

I'm not sure if reading J2Wasm itself would be that helpful, except maybe for the list of Binaryen optimizations it runs. J2Wasm for the most part does just a few Java-specific optimizations and then leaves all the rest to Binaryen.

So even with JIT capability, maybe we should still explore more approaches to directly support some languages (like TypeScript) more efficiently.

I do agree with this. But I think there is going to be a high bar for higher-level instructions like these, especially when they add VM complexity and overhead. And, OTOH, I think other approaches (like @tlively and I mentioned) can be just as fast at least in the monomorphic case, which is hopefully the common one. So I am skeptical that the high bar can be reached here. (But I could be wrong!)

Btw, two more ideas I've had:

I initially thought your approach of AOT-compiling a language with duck-typed interfacing was unique, but actually I realize that's exactly what Go does. So I'd be very interested to know how they avoid interface bloat in their binaries. Any techniques they use in native builds might help here.
In many codebases most properties are going to be data and not references. To benefit from that you can store data in linear memory and do efficient interface lookups there, and use WasmGC objects only for references. This will, however, need to wait for weak refs or finalizers for you to be able to free the linear memory properly, but those are on the WasmGC post-MVP roadmap for consideration.

xujuntwt95329 commented 1 year ago

@xujuntwt95329

Is there a link to the J2Wasm project or any documents about the analyses in binaryen as you mentioned?

For compiler writers to WasmGC, maybe the most useful links are:

Binaryen GC optimization cookbook

WasmGC lowering tips (wip)

There is also this older presentation PDF but it is outdated at this point.

Overall Binaryen's GC optimizations are fairly complete at this point, and include devirtualization, escape analysis, inlining, etc. (full list here), but there are probably many more things we could add. With Java, Kotlin, and Dart it's been useful for them to find cases that look like they should be optimized by Binaryen but are not, so @xujuntwt95329 if you find something like that, or if you have any other questions, let me know!

Hi @kripken thanks for the links, they are really helpful! If we find any code pattern that can be optimized, we will send them to you :)

I do agree with this. But I think there is going to be a high bar for higher-level instructions like these, especially when they add VM complexity and overhead. And, OTOH, I think other approaches (like @tlively and I mentioned) can be just as fast at least in the monomorphic case, which is hopefully the common one. So I am skeptical that the high bar can be reached here. (But I could be wrong!)

I agree that it's not suitable to introduce such high level opcode in GC MVP proposal. But hope it can be considered in Post-MVP phase, I'll try to compare with the solution without such opcode and find if there are more use case. 😀

I initially thought your approach of AOT-compiling a language with duck-typed interfacing was unique, but actually I realize that's exactly what Go does. So I'd be very interested to know how they avoid interface bloat in their binaries. Any techniques they use in native builds might help here.

Go has its own runtime, it also has some concept like itable, and it can access fields through any offset calculated during runtime. But in WasmGC, the field index in struct.get must be statically decided, that's why we need an indirect version.

In many codebases most properties are going to be data and not references. To benefit from that you can store data in linear memory and do efficient interface lookups there, and use WasmGC objects only for references. This will, however, need to wait for weak refs or finalizers for you to be able to free the linear memory properly, but those are on the WasmGC post-MVP roadmap for consideration.

Yes! Actually I think the only way to implement something like ArrayBuffer based on WasmGC is struct (for lifecycle management) + linear memory (for backing store). BTW, you mentioned weak refs or finalizers, is it possible to have finalizers on normal (strong) references (e.g. externref)? In our ts2wasm approach we usually box an external object into wasm world through externref, and the externref will manage the lifecycle of the external object, so we have a custom finalizer implementation in WAMR, would that be incompatible with spec?

kripken commented 1 year ago

BTW, you mentioned weak refs or finalizers, is it possible to have finalizers on normal (strong) references (e.g. externref)? In our ts2wasm approach we usually box an external object into wasm world through externref, and the externref will manage the lifecycle of the external object, so we have a https://github.com/bytecodealliance/wasm-micro-runtime/pull/2325, would that be incompatible with spec?

My understanding is that that is what is meant by https://github.com/WebAssembly/gc/blob/main/proposals/gc/Post-MVP.md#weak-references - so, yes, I expect that to be possible some time after the MVP. I am looking forward to it myself for some use cases.

Edit: I guess you mean externref in particular? Perhaps there would be questions about a finalizer for an object created outside of the wasm, but I'd expect that to work too.