dotnet / runtimelab

This repo is for experimentation and exploring new ideas that may or may not make it into the main dotnet/runtime repo.
MIT License
1.4k stars 196 forks source link

NativeAOT-LLVM: Question: Does dotnet/runtimelab want to host WIT code gen issues #2409

Closed yowl closed 7 months ago

yowl commented 11 months ago

Hi,

I started a discussion on https://bytecodealliance.zulipchat.com/#narrow/stream/394175-SIG-Guest-Languages about where to host a place where the interested parties can discuss the way forward for c# codegen for the Wasm Component model and WIT. There are 2 options on the table

  1. Here, or runtime as an issue or discussion (if GitHub discussions was enabled)
  2. Somewhere in the ByteAlliance space.

Some comments from that zulip chat:

"For language-ecosystem specific projects (e.g. bindgen, devtools) I personally think it'd be great for those to eventually belong to those ecosystems directly. Like .net dev tooling eventually ending up in the dotnet org someday. Tools that are still too early for that kind of integration/adoption or whose ecosystems don't have a single central place (e.g. componentize-py and compoenntize-js) have typically lived in the BA org itself."

" [not with my TSC delegate hat on, since we don't have an established principle for this] fwiw I think that we should also be open to hosting these kinds of things indefinitely if there isn't a better place. For both componentize-js and -py I'm not sure where better they should live than in the BA org, and I think for the foreseeable future their development will very much be driven by BA contributors, so hosting them makes sense.

IMO the same approach makes sense for the C#/.Net bindings: if there is an obvious better place and interest by the language community to host it there, that makes sense. Otherwise I'd be happy to champion hosting in the BA org if you're interested "

At present I've jotted some notes at https://github.com/yowl/WitCSharp so I didn't forget what was discussed at the first meeting, but that would be the first thing to move to the new location.

Any opinion or preference about this?

cc @jkotas @AaronRobinsonMSFT

jkotas commented 11 months ago

@yowl Thank you for leading this effort!

Does dotnet/runtimelab want to host WIT code gen issues

Issues and code should be together. If the issues are in dotnet/runtimelab, the code for .NET-specific WIT tooling should be in dotnet/runtimelab too. If dotnet/runtime is your preference, I would be happy to create a new runtimelab project for .NET-specific WIT tooling.

dotnet/runtimelab is meant to be a place for prototyping. It is not meant to be a permanent place for shipping projects, so the project would need to move somewhere else to ship for real. We can decide the final place that once we get there.

@AaronRobinsonMSFT Thoughts?

cc @pavelsavara @SteveSandersonMS @lewing

pavelsavara commented 11 months ago

I assume there would be non-trivial amount of logic for lifting/lowering C# types into component model ABI. And the same is probably true for other languages. Do they keep most of the code in wasm-bindgen ? Perhaps it would produce NuGet which would become part of dependencies ?

yowl commented 11 months ago

I assume there would be non-trivial amount of logic for lifting/lowering C# types into component model ABI.

Yes, and wit-bindgen has that logic currently in Rust. The c# backend to wit-bindgen is not 100% complete but does have a lot of code already. It is one of the priorites to decide how we move forward with this, in Rust, or port to c#. wit-bindgen is split into 2 layers from a high level. The core level interprets the wit and decides what functions, types, etc are required, and when to call the lowering/lifiting. The implementation is then through a Rust trait (c# interface) where we have the c# code generation. Going forwards the core level can produce JSON which we could consume in a c# source bindgen (this is the approach taken by the developers who worked on it at the summit and the hackathon week). In addtion, there is plans to move wit-bindgen itself to Wasm and this is probably the ideal destination for us, we can have the source gen in c# compiled to Wasm.

A Nuget eventually would seem like a good idea.

pavelsavara commented 11 months ago

I assume there would be non-trivial amount of logic for lifting/lowering C# types into component model ABI.

Yes, and wit-bindgen has that logic currently in Rust.

That meas the code generator ? Or that also the at-runtime lifting/lowering logic which translates from dotnet internal structures to ABI is in rust ?

I think that layout of CoreCLR, MonoVM and NativeAOT internal structures would be all different. Or do you consider double marshaling it via DLLImport first ?

Would that be easier in C# ?

Wasm and this is probably the ideal destination for us, we can have the source gen in c# compiled to Wasm.

Meaning that the toolchain would also have dependency on some wasm engine ? (besides dependency on rust)

I'm thinking about VS users.

Are we planning to ship binaries ? Or compile-it-yourself rust style ? Is binaries OK from license perspective ?

jkotas commented 11 months ago

the core level can produce JSON which we could consume in a c# source bindgen (this is the approach taken by the developers who worked on it at the summit and the hackathon week)

Do you expect the same architecture with intermediate JSON to be adopted by bindgen for other languages?

jkotas commented 11 months ago

Or that also the at-runtime lifting/lowering logic which translates from dotnet internal structures to ABI is in rust ?

All WIT processing and marshalling generation should be done at build time.

pavelsavara commented 11 months ago

I noticed that cargo_component_bindings for rust creates components which have 3 wasm core modules inside. Where one is the actual user code and other two are lifting/lowering of exports/imports. They talk to each other via shared memory and pointers (not ABI).

Do we want to do the same thing ? Should we have maybe 4th core module for the dotnet runtime separated from the user code ? Or even separate module for each .NET assembly ? "Native" dependencies like ICU could also be "linked" this way.

yowl commented 11 months ago

That meas the code generator ?

Yes, wit-bindgen has an in development c# backend that produces c# source code so the "at runtime" lifting/lowering code is in c#

I think that layout of CoreCLR, MonoVM and NativeAOT internal structures would be all different

What do you have in mind here when you say "internal structures"? At present I don't see a need for different layouts, maybe I'm missing something on this point.

Meaning that the toolchain would also have dependency on some wasm engine ?

Yes, but in this scenario, we would drop the dependency on Rust.

Are we planning to ship binaries ? Or compile-it-yourself rust style ? Is binaries OK from license perspective ?

As of now, we would have to ship a binary that would be invoked from an msbuild task. I think that is going to be true for rust based generation, or Wasm based generation. Regards the license, do you mean the wasm runtime engine?

Do you expect the same architecture with intermediate JSON to be adopted by bindgen for other languages?

No, at least not for Rust, C, Go, Java as they already have Rust based code generators.

4th core module for the dotnet runtime separated from the user code

That's an interesting idea, and if there are multiple C# components linked through different WIT worlds, then should they share a runtime? If we look at https://github.com/WebAssembly/component-model/blob/main/design/mvp/examples/SharedEverythingDynamicLinking.md then the code could be shared but not the linear memory.

jkotas commented 11 months ago

Do you expect the same architecture with intermediate JSON to be adopted by bindgen for other languages?

No, at least not for Rust, C, Go, Java as they already have Rust based code generators.

I am a bit worried if .NET/C# uses architecture that differs from every other language. We can certainly start that way, but there is a high chance that it won't pass the test of time.

pavelsavara commented 11 months ago

That meas the code generator ?

Yes, wit-bindgen has an in development c# backend that produces c# source code so the "at runtime" lifting/lowering code is in c#

I think that layout of CoreCLR, MonoVM and NativeAOT internal structures would be all different

What do you have in mind here when you say "internal structures"?

If the lifting/lowering is in rust and talking to dotnet VM on native level, it needs to understand the details of MonoObject or MonoString stuctures, for example. Vs NativeAOT representation of the same.

If the lifting/lowering code is in C#, then there is no issue with that. Does it mean it targets the WASI ABI is via DLLImport marshaling ? Is that the right way how to go about it ?

There is non-trivial logic on Own/Borrow of WASM component "resources". I'm not sure you can express those in DLLImport unless you enter COM+ land.

I think that promises are being implemented in terms of "resources".

yowl commented 11 months ago

Do you expect the same architecture with intermediate JSON to be adopted by bindgen for other languages?

No, at least not for Rust, C, Go, Java as they already have Rust based code generators.

I am a bit worried if .NET/C# uses architecture that differs from every other language. We can certainly start that way, but there is a high chance that it won't pass the test of time.

Currrently there are 2 approaches. The more complete, and the one I personally have been working on is the Rust based generation, a backend in the same project as C, Rust, Go, and Java. Then there is the prototype Json generator. The ByteAlliance have expressed an interest in having the code generators in the same language as that being produced, and longer term compiling that generation to Wasm. I agree we don't want to go off on our own, whatever is the recommended, and adopted approach is where we should be.

jkotas commented 11 months ago

The ByteAlliance have expressed an interest in having the code generators in the same language as that being produced

Is there a rationale for this decision? What does it help with? I think it will produce a lot of duplicated effort and duplicated code.

I see WIT compiler to be very similar to protobuf compiler where all backends happily live together in one protoc compiler written in C++, including the C# backend: https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/compiler/main.cc#L83-L86

jkotas commented 11 months ago

If the lifting/lowering is in rust and talking to dotnet VM on native level, it needs to understand the details of MonoObject or MonoString stuctures, for example. Vs NativeAOT representation of the same.

The generated code should only depend on public runtime APIs. If there are no public runtime APIs for what needs to be done, we should add them as necessary. This is a core principle that we are following for all new interop designs.

pavelsavara commented 11 months ago

The generated code should only depend on public runtime APIs.

Just thinking out loud, hope this helps.

In order to efficiently marshal string, we need expose API for length and pointer to UTF-16 buffer (without copying it) so that the generated code could convert it for the WASI ABI. Is that the level of public API you are suggesting ?

If the WASI host implements "fused adapter" and we are calling to another dotnet component, perhaps this is only memory mapping or plain copy from one component to another (by the host), not even re-encoding it. But the buffer would have to be pinned in dotnet side.

Receiving such buffer from WASI ABI and making dotnet string out of it, without copy. Would that work with WASI's cabi_realloc ?

jkotas commented 11 months ago

we need expose API for length and pointer to UTF-16 buffer

We have APIs for that today.

(In abstract, it is the type of APIs that we can add to enable interop.)

pavelsavara commented 11 months ago

We have APIs for that today.

We have it for C# and if generated code is C# this is fine. I was speculating about lifting/lowering code being also native code like rust and needing native APIs of the dotnet VMs.

Do we have chicken/egg problem with needing to marshal initial parameters to our component via C# before even dotnet runtime is started ?

Or perhaps WASI _start and _initialize will solve that.

yowl commented 11 months ago

Is there a rationale for this decision?

What has been mentioned is that it would help to bring c# developers into the community if they could contribute in C# rather than having to learn Rust. By the way, Python, componetize-py, uses a Rust backend, but generates the lifting and lowering in Wasm. From the zulip chat: " Joel Dice: Correct. The main reason for that is most of the code componentize-py generates is Wasm, not Python. The intention is that part could be reused for other high-level languages.

Joel Dice: (i.e. all the lifting and lowering code is Wasm, with calls out to CPython to construct and deconstruct Python objects)

Scott Waye: Interesting, thanks. Do you know if javascript it the same by any chance?

Joel Dice: No, I think componentize-js currently generates JS code for lifting and lowering. If @Guy Bedford agrees, we could split the Wasm lift/lower generator out of componentize-py and reuse it in componentize-js. It would be a pretty big refactor, though. "

yowl commented 11 months ago

The generated code should only depend on public runtime APIs

The only non public API we have currently is the use of WasmImport in the csproj to define the module names and this is because there is no public API that really fits yet. There is some more discussion of this currently in https://github.com/dotnet/runtimelab/pull/2410

pavelsavara commented 11 months ago

What has been mentioned is that it would help to bring c# developers into the community if they could contribute in C# rather than having to learn Rust. By the way, Python, componetize-py, uses a Rust backend, but generates the lifting and lowering in Wasm. From the zulip chat: " Joel Dice: Correct. The main reason for that is most of the code componentize-py generates is Wasm, not Python. The intention is that part could be reused for other high-level languages.

That somewhat resonates. Few weeks ago during hackathon while we worked on JCO alternative JSCO I had 2 reasons why to not re-use existing rust code.

1) it's large download and I wanted this to be in-the-browser, not AOT. 2) JS crowd would not be willing to learn rust just to contribute.

And also I started converting definitions.py to JavaScript but then run out of time. But I believe this is right thing to do for JS ecosystem.

jkotas commented 11 months ago

The number of contributors for WIT C# tools is never going to be very high, handful significant contributors at best. I do not expect that writing the WIT C# tools in C# is going to attract enough additional contributions to offset the duplication of efforts.

jsturtevant commented 11 months ago

Then there is the prototype Json generator. The ByteAlliance have expressed an interest in having the code generators in the same language as that being produced, and longer term compiling that generation to Wasm. I agree we don't want to go off on our own, whatever is the recommended, and adopted approach is where we should be.

I'm not sure there is a currently blessed approach since so many moving parts. The group working on the go WIT generators are moving to using the json output and moving out of the rust wit-bindgen.

I see WIT compiler to be very similar to protobuf compiler where all backends happily live together in one protoc compiler written in C++, including the C# backend: https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/compiler/main.cc#L83-L86

I think this is how this is working today, for example c is genreated with wit-bindgen c ./wit and tiny-go wit-bindgen tiny-go ./wit

The number of contributors for WIT C# tools is never going to be very high, handful significant contributors at best. I do not expect that writing the WIT C# tools in C# is going to attract enough additional contributions to offset the duplication of efforts.

The other idea behind the doing the work in c# would be to support a more c# developer experience through source generators to create experience like rust where you import a library and most of the rest is done for you. As well as being about to use roslyn apis which I think would be easier to maintain thank print statements. Maybe there are ways to still have a that experience with other approaches.

jsturtevant commented 11 months ago

I think it will produce a lot of duplicated effort and duplicated code.

One other thought I had was that most of the duplicated code is in the wit parser as of today, by using the json output of the parser we wouldn't be duplicating effort there. The individual languages in wit-bindgen end up not sharing a ton of common code from what I can tell.

jkotas commented 11 months ago

The other idea behind the doing the work in c# would be to support a more c# developer experience through source generators to create experience like rust where you import a library and most of the rest is done for you.

C# source generator is not needed to deliver this experience. The protoful compiler delivers this experience without C# source generator.

AaronRobinsonMSFT commented 11 months ago

The other idea behind the doing the work in c# would be to support a more c# developer experience through source generators to create experience like rust where you import a library and most of the rest is done for you.

C# source generator is not needed to deliver this experience. The protoful compiler delivers this experience without C# source generator.

Agreed, but it is not the canonical form for source generator solutions in .NET. For .NET we have two primary paths (1) small tool that can be triggered pre/post build in MSBuild or (2) Roslyn source generators.

The most flexible is (1) and in most cases sufficient. Option (2) has a lot of support but is very complex overall and has broad implications when being written. The biggest is impact on IDE scenarios. The WASM scenario is very closely aligned with the COM source generator scenario, which is a Roslyn source generator, and as such I would argue it should be the long term solution.

If this statement is about making progress to deliver a solution that is sufficient and can enable users ASAP, then I agree that (1) is likely the best path. We can even design any small tooling to be foldable into Roslyn source generator solution in the future.

Some thoughts on statements I missed as I was out of the office.

pavelsavara commented 11 months ago

The WASM scenario is very closely aligned with the COM source generator scenario

That sounds scary, could you please elaborate ?

jkotas commented 11 months ago

The source of truth is specified in C# for COM source generator. It makes sense to use C# source generators when the source of truth is specified in C#.

The source of truth is specified in WIT for WIT generator. C# source generators have limited value when the source of truth is not specified in C#.

AaronRobinsonMSFT commented 11 months ago

The WASM scenario is very closely aligned with the COM source generator scenario

That sounds scary, could you please elaborate ?

It is basically the same pattern.

COM: C# -> Assembly or Assembly/C# -> TLB/IDL WASM: WIT -> C# -> Assembly or Assembly/C# -> WIT

The source of truth is specified in C# for COM source generator. It makes sense to use C# source generators when the source of truth is specified in C#.

True and I would agree it is something we shouldn't get ahead of so (1) is likely appropriate. The full UX and existing .NET infrastructure is where the Roslyn source generator might be a win, not right now but going forward.

yowl commented 11 months ago

At the witbindgen c# meeting yesterday we agreed to continue, at least for now, with the c# codegen as a backend to wit-bindgen, i.e. in Rust. Therefore we will host WIT code gen issues in the wit-bindgen repo. I guess we will get a tag created to group c# things together. The code gen will be executed as an MSBuild task (current thinking) and distributed as a nuget. That work can be here, or runtime.

tschneidereit commented 10 months ago

(Bytecode Alliance TSC/Board member here 👋🏻)

I'm very late to the party, but just wanted to note that the Bytecode Alliance doesn't as such have a preference for how to structure/build bindings generators. For Go, the work is happening in Go based on the JSON approach, while most other languages for now use Rust directly. Same as with the question of where to host the bindings generator, I think whatever seems to make most sense to the people doing the work, and in the context of the respective ecosystem is what should be done :)

I'm saying all this to emphasize that the decision @yowl mentioned in the previous comment is one we fully support, but if ever another approach becomes more attractive in the c# context, we'll gladly do what we can to support that as well.

yowl commented 7 months ago

Thanks all for the contributions, we are hosting in BA, https://github.com/bytecodealliance/wit-bindgen so I will close this.