Open xujuntwt95329 opened 1 year ago
As @rossberg wrote in https://github.com/WebAssembly/meetings/pull/1279#discussion_r1201812009:
@xujuntwt95329, since these operations have to inspect and dispatch on (the structure of) the runtime type of the object, they represent a form of type reflection. They would be the Wasm equivalent to struct.getClass().getField(x).get/set(struct), the only difference being that x is numeric, not a string, because fields are accessed by number in Wasm. This idea further assumes that every heap object carries sufficient runtime information to perform such reflection, which is not an assumption I'd like to build into Wasm itself.
@rossberg Thanks for the explain, I agree with you that the idea of struct.get_indirect
is a kind of reflection (or more specifically, index based reflection😀).
If this is not a good idea in WebAssembly, do you have any suggestions about how to support the TypeScript's interface
semantic based on WasmGC?
I know we can implement a runtime inside wasm and do such dispatch in "user space", but that will require all the objects to be allocated on linear memory rather than WasmGC heap, but that's not what we want. We do want to leverage WasmGC to support these languages.
Another way to implement this reflection in user space would be to generate a function for each struct type that takes an index and uses a br_table
to use the index to determine which field to load. IIUC, this is not too far off from how an engine would implement this internally, anyway.
Another way to implement this reflection in user space would be to generate a function for each struct type that takes an index and uses a
br_table
to use the index to determine which field to load. IIUC, this is not too far off from how an engine would implement this internally, anyway.
Thanks for the suggestion. Yes, this is a feasible approach, but seems this will greatly increase the footprint of the generated wasm module. So in my opinion, compared to this user space approach, the proposed struct.get_indirect
opcode can help to generate more compact wasm modules
@tlively I think that idea can be extended to consider the index static. That is, when you get a property on an interface, you know which property at every use:
interface Rectangle {
width: number;
height: number;
}
// Use the interface Rectangle's property "width".
foo(rectangle.width)
That last line's read of .width
can turn into an itable call of a method get_width()
. So each class implementing the interface would fill in an itable with one method per property (or two, one to read and one to write).
That still means doing an itable call (read itable object, do indirect call) on each access, which is slow, but if the VM does runtime inlining it should be fast enough in the monomorphic case.
(In the polymorphic case I think linear memory can do much better with a table of offsets. That would avoid any indirect call or even call entirely.)
@xujuntwt95329, in general, the idea is that Wasm essentially is a virtual CPU, not a high-level VM. So the two questions to ask are:
In other words, you generally shouldn't expect any more support from Wasm than you get from a native CPU. Certain exceptions apply where we make something more complex into a Wasm feature, but the bar for that is very high, and usually has to do with working around unavoidable restrictions of Wasm.
@rossberg, for your two questions:
How would this feature be compiled if the target was a real CPU?
On a real CPU, we definitely need to implement a runtime to support such feature.
How is equivalent code expressible in Wasm's somewhat more constrained instruction set?
As you mentioned, the wasm essentially is a virtual CPU, so in the current MVP wasm instruction set, we need to implement a runtime and compile to wasm opcode to support such feature.
However, I have some different opinion about the idea is that Wasm essentially is a virtual CPU, not a high-level VM
: I agree that the opcode defined in MVP core spec can be treated as virtual CPU instructions, and they may even be implemented in HDL to become a real hardware CPU; but the opcode introduced in WasmGC proposal seems already provide some high level semantics, I believe no real CPU can support such semantics.
@tlively I think that idea can be extended to consider the index static. That is, when you get a property on an interface, you know which property at every use:
interface Rectangle { width: number; height: number; } // Use the interface Rectangle's property "width". foo(rectangle.width)
That last line's read of
.width
can turn into an itable call of a methodget_width()
. So each class implementing the interface would fill in an itable with one method per property (or two, one to read and one to write).That still means doing an itable call (read itable object, do indirect call) on each access, which is slow, but if the VM does runtime inlining it should be fast enough in the monomorphic case.
(In the polymorphic case I think linear memory can do much better with a table of offsets. That would avoid any indirect call or even call entirely.)
Yes, this can also support the interface feature, but in TypeScript, interface
is implicitly implemented, the compiler can't rely on the implements
keyword to decide the implement relationship, instead it should treat all classes implemented all interfaces that has fewer fields than them, so this will also make the generated wasm module become very large
@xujuntwt95329, I've used the slogan "As low-level as possible, but no lower" to describe Wasm. As I said, there are exceptions where we have to raise the abstraction level somewhat, and GC is one of them. But that is no free pass to arbitrary high-level operations. Even with GC, we have been very careful to (almost) keep it as low-level as we can. That is why there are only plain tuples and arrays, no object system, no classes, no methods, no reflection, no hidden allocations, etc. I have my concerns about how we failed on that front wrt to casts, but other than that, we managed to stay clear of complex operations.
That doesn't mean you can't use Wasm GC. It only means that you still have to build a language-specific runtime and custom data structures on top of it. Wasm takes care of GC for you, but nothing else. GC types merely describe the layout of your runtime's data structures to the engine.
@rossberg I understand that we should try to avoid complex operations in current GC proposal, thanks for all your detailed explanations!
I understand that supporting this kind of reflection seems beyond the scope of the GC MVP proposal, but it do have some benefit in the engineering (e.g. help to reduce the module size).
So do you think it could be considered in the Post-MVP? And if so, will it be meaningful if we can provide some data about the performance
, memory usage
, benefit
and reference implementation
of such opcode?
@xujuntwt95329
in TypeScript, interface is implicitly implemented, the compiler can't rely on the implements keyword to decide the implement relationship, instead it should treat all classes implemented all interfaces that has fewer fields than them, so this will also make the generated wasm module become very large
Good point, yeah, if there are commonly-used names then we'd end up with many classes needing itables to support them. And maybe struct.get|set_indirect
can be somewhat more compact overall. But even with it, you can have a large amount of metadata, I think? (the tables that give you the offsets for struct.get|set_indirect
, and the VM's internal tables for those instructions). Also, maybe the larger question is how big that metadata is compared to function code and runtime data.
It might be interesting to measure these overheads. However, I agree with @rossberg that these proposed new opcodes are significantly higher-level than anything else in WasmGC so far, so there is a high bar, I think, for considering them.
But I do agree that supporting dynamic languages like TypeScript in wasm is a very important goal! My personal suggestions would be:
I agree with @kripken that a key feature to enable the implementation of high-level runtimes in Wasm would be a general mechanism for user-space jitting. That has been an assumption for long, but I don't think we have a particularly clear idea yet what it should look like.
@kripken Thanks for the suggestions!
Also, maybe the larger question is how big that metadata is compared to function code and runtime data.
Yeah, currently we use host APIs to emulate the struct.get|set_indirect
opcode in WAMR, and we can use the user space dispatch
approach on the browser, then we can compare these two approaches.
But I do agree that supporting dynamic languages like TypeScript in wasm is a very important goal!
Yeah, we start this TypeScript to wasm work because we think it can bring more developers to WebAssembly, and we think the current (quickjs, python)vm inside (wasm)vm
solution is not so efficient.
But we don't treat TypeScript as an absolutely dynamic language, we try to do static compilation as much as possible based on the type information in the source code, and support limited dynamic semantics through a series of APIs. (We see more and more dynamic languages start to have type annotation, e.g. python 3.5+, so we believe this “partial static compile” idea can benefit several dynamic languages)
In the short term, I'd suggest trying to see how far we can get with static analysis to remove unneeded metadata and overhead. I'd be happy to work with you on the Binaryen side - we've worked closely with J2Wasm for Java and so Binaryen already has a lot of generic analyses to remove unneeded fields in itables, to devirtualize, etc., which should already help you, and we can explore more.
That would be really helpful! Thanks for your help. Is there a link to the J2Wasm project or any documents about the analyses in binaryen as you mentioned?
@rossberg @kripken Yes, I agree that for the long term, user space Jitting should be a great solution for dynamic languages, but JIT also have some disadvantages:
So even with JIT capability, maybe we should still explore more approaches to directly support some languages (like TypeScript) more efficiently.
@xujuntwt95329
Is there a link to the J2Wasm project or any documents about the analyses in binaryen as you mentioned?
For compiler writers to WasmGC, maybe the most useful links are:
Overall Binaryen's GC optimizations are fairly complete at this point, and include devirtualization, escape analysis, inlining, etc. (full list here), but there are probably many more things we could add. With Java, Kotlin, and Dart it's been useful for them to find cases that look like they should be optimized by Binaryen but are not, so @xujuntwt95329 if you find something like that, or if you have any other questions, let me know!
I'm not sure if reading J2Wasm itself would be that helpful, except maybe for the list of Binaryen optimizations it runs. J2Wasm for the most part does just a few Java-specific optimizations and then leaves all the rest to Binaryen.
So even with JIT capability, maybe we should still explore more approaches to directly support some languages (like TypeScript) more efficiently.
I do agree with this. But I think there is going to be a high bar for higher-level instructions like these, especially when they add VM complexity and overhead. And, OTOH, I think other approaches (like @tlively and I mentioned) can be just as fast at least in the monomorphic case, which is hopefully the common one. So I am skeptical that the high bar can be reached here. (But I could be wrong!)
Btw, two more ideas I've had:
@xujuntwt95329
Is there a link to the J2Wasm project or any documents about the analyses in binaryen as you mentioned?
For compiler writers to WasmGC, maybe the most useful links are:
- Binaryen GC optimization cookbook
- WasmGC lowering tips (wip)
- There is also this older presentation PDF but it is outdated at this point.
Overall Binaryen's GC optimizations are fairly complete at this point, and include devirtualization, escape analysis, inlining, etc. (full list here), but there are probably many more things we could add. With Java, Kotlin, and Dart it's been useful for them to find cases that look like they should be optimized by Binaryen but are not, so @xujuntwt95329 if you find something like that, or if you have any other questions, let me know!
Hi @kripken thanks for the links, they are really helpful! If we find any code pattern that can be optimized, we will send them to you :)
I do agree with this. But I think there is going to be a high bar for higher-level instructions like these, especially when they add VM complexity and overhead. And, OTOH, I think other approaches (like @tlively and I mentioned) can be just as fast at least in the monomorphic case, which is hopefully the common one. So I am skeptical that the high bar can be reached here. (But I could be wrong!)
I agree that it's not suitable to introduce such high level opcode in GC MVP proposal. But hope it can be considered in Post-MVP phase, I'll try to compare with the solution without such opcode and find if there are more use case. 😀
I initially thought your approach of AOT-compiling a language with duck-typed interfacing was unique, but actually I realize that's exactly what Go does. So I'd be very interested to know how they avoid interface bloat in their binaries. Any techniques they use in native builds might help here.
Go has its own runtime, it also has some concept like itable
, and it can access fields through any offset calculated during runtime. But in WasmGC, the field index in struct.get
must be statically decided, that's why we need an indirect version.
In many codebases most properties are going to be data and not references. To benefit from that you can store data in linear memory and do efficient interface lookups there, and use WasmGC objects only for references. This will, however, need to wait for weak refs or finalizers for you to be able to free the linear memory properly, but those are on the WasmGC post-MVP roadmap for consideration.
Yes! Actually I think the only way to implement something like ArrayBuffer
based on WasmGC is struct
(for lifecycle management) + linear memory
(for backing store). BTW, you mentioned weak refs or finalizers
, is it possible to have finalizers on normal (strong) references (e.g. externref
)? In our ts2wasm
approach we usually box an external object into wasm world through externref
, and the externref
will manage the lifecycle of the external object, so we have a custom finalizer implementation in WAMR, would that be incompatible with spec?
BTW, you mentioned weak refs or finalizers, is it possible to have finalizers on normal (strong) references (e.g. externref)? In our ts2wasm approach we usually box an external object into wasm world through externref, and the externref will manage the lifecycle of the external object, so we have a https://github.com/bytecodealliance/wasm-micro-runtime/pull/2325, would that be incompatible with spec?
My understanding is that that is what is meant by https://github.com/WebAssembly/gc/blob/main/proposals/gc/Post-MVP.md#weak-references - so, yes, I expect that to be possible some time after the MVP. I am looking forward to it myself for some use cases.
Edit: I guess you mean externref in particular? Perhaps there would be questions about a finalizer for an object created outside of the wasm, but I'd expect that to work too.
Purpose of this issue
Hi, in GC-05-16 meeting we have introduced our idea of adding
struct.get_indirect
andstruct.set_indirect
opcode to support language semantics likeTypeScript's interface
in WasmGC, we had some discussion during the meeting and PR for the meeting notes, and this issue is to continue the discussion.Background
WasmGC opened the door for supporting high level managed language more efficiently than compiling the whole language runtime to WebAssembly, we are interested in exploring a solution to compile TypeScript to WasmGC. Our strategy is:
any
type, they are represented as an externref, and we introduced alibdyntype
to support dynamic accessing and type checking on them, operation on these objects will be converted to API call tolibdyntype
This provide the flexibility to the developer:
any
is also supported but they will pay for the performance influenceProblem
Most types work well based on the strategy above, but we encountered problem with
interface
.interface
does not describe the layout of a concrete type, its just an un-ordered collection of field names and types. This means the actual object type holding by the interface is not known at compile time, so we have two solutions:any
(interface
is heavily used in TypeScript, if we treat is as any, the performance impact may be huge)field index
according to the meta data and field name (preferred)The
option 2
is preferred because we can still represent objects using WasmGC types, and we can do further optimization to avoid searching the meta info in some scenarios (e.g. check a pre-assigned type-id). Option 2 needs to calculate thefield index
during runtime, but currently thestruct.get/set
opcode require thefield index
to be immediate.Proposed Opcode
To accept
field index
calculated during runtime, we proposed theindirect
version of struct access opcode:This require some runtime type checking:
Then we can use this opcode to access the field of interface:
Influence to runtime implementation
performance: The proposed opcode will be slow due to several runtime checking, but this will not influence the performance of any other opcodes
memory usage: To apply runtime checking, it is required that every GC object have a reference to its defined type. Since we already have
RTT
now, the only thing we need to add is atype index
in RTT object, so the impact on memory should be lowIs there any workaround without the proposed opcode?
Treat
interface
asany
as mentioned option 1 aboveCompile whole language runtime into WebAssembly, and execute the source code in that "vm inside vm"