WebAssembly / gc

Branch of the spec repo scoped to discussion of GC integration in WebAssembly
https://webassembly.github.io/gc/
Other
998 stars 72 forks source link

.NET Notes #77

Open vargaz opened 4 years ago

vargaz commented 4 years ago

I was asked to add some notes based on the needs of Microsoft .NET implementations wrt GC in WebAssembly.

  1. Object layout .NET objects in memory usually consist of a header followed by object data. The header contains data such as:
    • the vtable/type pointer
    • a sync word for locking on the object
    • a length field for arrays/strings
    • for multi-dimensional arrays, a pointer to a struct describing the dimensions.

The type system of the current proposal doesn't seem to be able to support this layout, i.e. a header followed by array data. Also, .NET supports arrays of structs, i.e. an array of {ref,non-ref} would look like in memory: [ref, non-ref, ref, non-ref, etc.].

In general, it seems very difficult to model all possible object layouts used by GCd languages.

  1. Interior pointers In .NET, its quite common to have pointers into the middle of objects (arrays), and pointers to one past the end of an array. anyref doesn't seem to be able to support this.

  2. Interop with C/C++ code The .net runtimes are written in C/C++ and assume that object references are normal C pointers which point to linear memory, and objects can be accessed from C code as a pointers to C structs. The current proposal places allocated objects outside linear memory and adds new accessors to read/write their contents. To allow manipulation of these objects from C code would require extensions to the C compilers.

  3. Finalization The .net runtime needs to be notified somehow when an object with a finalizer dies.

  4. Weak references .net supports multiple kinds of weak references which might not be supported by the underlying JS GC.

  5. Non web runtimes Non-web runtimes would need to add a GC implementation, since GC is such a core feature that it probably cannot be treated as an optional feature like SIMD.

  6. LLVM support The new types/type constructs don't exist in LLVM, not clear how they can be added.

rossberg commented 4 years ago
  1. Object layout [...] The type system of the current proposal doesn't seem to be able to support this layout, i.e. a header followed by array data. Also, .NET supports arrays of structs, i.e. an array of {ref,non-ref} would look like in memory: [ref, non-ref, ref, non-ref, etc.].

This form of nesting aggregates is supported (and I agree essential) as a Post-MVP feature. We cut it from the MVP because it can always be replaced with an indirection, so isn't strictly required for functional completeness. That decision could be reversed, of course, but it is tricky keeping the MVP small.

  1. Interior pointers In .NET, its quite common to have pointers into the middle of objects (arrays), and pointers to one past the end of an array. anyref doesn't seem to be able to support this.

Interior references are part of nested aggregates extension described above, so equally Post-MVP atm. They will be a distinct type from regular references (which can be converted into them). The reason is that this allows engines to represent them differently, e.g., as fat pointers. In past discussions with .NET folks they believed this would probably be good enough for .NET, because inner pointers only arise in specific contexts, but this requires further investigation.

  1. Interop with C/C++ code The .net runtimes are written in C/C++ and assume that object references are normal C pointers which point to linear memory, and objects can be accessed from C code as a pointers to C structs.

Yes, this is a known limitation. It is unlikely that much can be done about it directly. However, interface types may be able to emulate this interop.

  1. Finalization
  2. Weak references

These are tough ones, and we don't have a good idea yet how to support the myriads of different finalisation semantics out there without creating a zoo. Very likely Post-post-MVP, but suggestions are welcome.

  1. Non web runtimes Non-web runtimes would need to add a GC implementation, since GC is such a core feature that it probably cannot be treated as an optional feature like SIMD.

It is a stated goal that GC remains an optional feature. We have been very careful to design this and other features such that there are no unwanted dependencies on the presence of GC.

  1. LLVM support The new types/type constructs don't exist in LLVM, not clear how they can be added.

True, but that's a tool chain problem that needs solving and that is fundamentally unavoidable.

Horcrux7 commented 4 years ago

@vargaz There is an alternative GC suggestion https://github.com/soil-initiative/gc/pull/1 that should better match for .NET.

The example for an OO language can be interesting https://github.com/soil-initiative/gc/blob/103eb72aaa7f3a7ecb3a436ce95ae9f108311799/proposals/gc/NomOO.md

aardappel commented 4 years ago

(3) in interesting to me, in the sense that I've mentioned it before as a major issue with the current GC design. Most languages come with significant C or C++ runtimes that assume direct access to language data. These runtimes would need to be rewritten in a Wasm-GC aware language, which in some cases may be impractical. If these languages wish to keep using their existing runtime, they will be forced to do their own GC in linear memory, losing out on many benefits.

I have no good solutions either, I just think we should be more aware of this tradeoff. I can imagine that an alternative GC proposal that allows objects to live in linear memory would work much better for many existing languages, but is likely much harder to make work with host interop.

In some sense, the current GC proposal favors new language implementations, or even new languages/dialects.

tlively commented 4 years ago

I am interested in the LLVM support problem, but I'm not familiar with the .NET ecosystem. Does .NET currently use LLVM, or is the question of LLVM support only coming up because LLVM is the only compiler toolkit that currently targets WebAssembly?

vargaz commented 4 years ago

What I meant at (7) is that if this proposal is implemented, then LLVM would have to add all these type constructs to their IR somehow, and it's not clear how that can be done. Perhaps by using llvm metadata on types which is read by the wasm backend.

tlively commented 4 years ago

Right, but I'm wondering how LLVM relates to the .NET ecosystem. It's actually possible that we will not end up implementing GC or other features in LLVM if for instance no LLVM frontend could feasibly make use of those features. If you have an LLVM frontend that would like to use GC types, that would be very useful information.

vargaz commented 4 years ago

Currently, our .net for WebAssembly project is built on top of LLVM, i.e. we compile .net bytecode to LLVM bitcode to wasm.

aykevl commented 3 years ago

@tlively A late reply from my side, now that I've discovered this issue.

I'm hitting very similar issues as .NET with TinyGo (which is based on LLVM), which also has a runtime that assumes objects are in linear memory. While the proposal seems interesting, I have a hard time imagining how this would fit in LLVM. It will probably also require major changes to the TinyGo compiler.

I really wish I could use the WebAssembly GC in TinyGo as using the regular TinyGo GC has many problems (such as circular references), but as it is now I don't see how that would feasibly work.

tlively commented 2 years ago

FWIW, @pmatos and @asb are working on adding support for reference types (eventually including GC types) to LLVM and even clang, so it is likely that LLVM-based languages and languages with C-based runtimes will eventually be able to use WasmGC, although it will still require some source changes.

pmatos commented 2 years ago

@aykevl As mentioned by @tlively , LLVM already has reference types support and we are in the process of adding support for this in Clang, see: https://reviews.llvm.org/D122215 https://reviews.llvm.org/D128440 https://reviews.llvm.org/D123510 https://reviews.llvm.org/D124162

Next our work will focus on implementing the GC proposal in LLVM (something @asb has already started thinking about), and bringing that proposal to Clang.

bashor commented 2 years ago

@pmatos, @asb I'm wondering for which use cases you are going to support GC proposal in LLVM? In other words, who is the target client/audience of the feature? How it would looks like for projects that wanted to use it? Will it be implemented on top of generic GC support in LLVM or something else?

(I'm not an expert in LLVM)

asb commented 2 years ago

@bashor primarily sharing object graphs across the wasm boundary.

Whether it will be built on LLVM's GC support is a good question (and a common one). Wasm's GC types and instructions are actually at quite a different level - you could imagine how it might have taken a different path where wasm code generators had to communicate information about the locations of GC types in memory (which would make LLVM's GC support more relevant), but that's not the direction it went in. Wasm GC types are heavily restricted in that you can't store them in linear memory, only store them in Wasm tables, locals, globals, and function params/returns. See the Wasm GC overview for a little bit more background.

Hope that helps.

malekbr commented 1 year ago
  1. Finalization
    1. Weak references

These are tough ones, and we don't have a good idea yet how to support the myriads of different finalisation semantics out there without creating a zoo. Very likely Post-post-MVP, but suggestions are welcome.

Is there documentation/a discussion log of the different finalization semantics to consider?

rossberg commented 1 year ago

@malekbr, AFAICT, nobody has made any concrete suggestions so far. As mentioned in my reply, it's rather non-obvious. Now that the GC MVP is done, this topic could use a champion to investigate as a separate proposal.