dotnet / runtimelab

This repo is for experimentation and exploring new ideas that may or may not make it into the main dotnet/runtime repo.
MIT License
1.38k stars 192 forks source link

[NativeAOT LLVM] Support dotnet.js #2434

Open maraf opened 9 months ago

maraf commented 9 months ago

Smoke test with currently supported feature set https://github.com/dotnet/runtimelab/blob/feature/NativeAOT-LLVM/src/tests/nativeaot/SmokeTests/DotnetJs

Original prototype https://github.com/maraf/MinimalDotNetWasmNativeAOT/tree/dotnetjs/DotnetJsHack

maraf commented 9 months ago

cc @dotnet/nativeaot-llvm @pavelsavara @ivanpovazan

SingleAccretion commented 9 months ago

A few comments on the overall architecture.

Buidling naot flavored dotnet.js

The current NAOT-LLVM build is very simple (pass a few things to emcc, get .js back). It would be very valuable to ensure it remains so.

Another point is that the JS runtime contains things that are not the runtime's responsibility, but rather belong to libraries (e. g. hybrid globalization). It should be ensured that the runtime (unmanaged code and Runtime.Base) continues to be unaware of them.

JS interop

As with the above, (C#) code supporting it should live separately, in, e. g., a new System.Private.* library. If we need some special hooks into runtime functionality, they should be provided through an explicit interface.

(That said, I am not familiar with how it is implemented currently)

Async Main

Is a bit of a hard problem. Last time I looked, Mono code looked it up by name emitted by Roslyn. That is a hack. Perhaps we could consider asking Roslyn to somehow mark it...

maraf commented 9 months ago

The current NAOT-LLVM build is very simple (pass a few things to emcc, get .js back). It would be very valuable to ensure it remains so.

In Mono, dotnet.js is build during runtime build and than during app build it's just picked up from runtime pack + emscripten js with build different settings. Our golas is to support the same JS API as we have for Mono (you can see it here https://github.com/maraf/MinimalDotNetWasmNativeAOT/blob/dotnetjs/DotnetJsHack/main.js. The withConfig will go away). Does that work for you?

Another point is that the JS runtime contains things that are not the runtime's responsibility, but rather belong to libraries (e. g. hybrid globalization). It should be ensured that the runtime (unmanaged code and Runtime.Base) continues to be unaware of them.

I don't details yet. AFAIK hybrid globalization is done by icalls from Globalization* into JavaScript

As with the above, (C#) code supporting it should live separately, in, e. g., a new System.Private.* library. If we need some special hooks into runtime functionality, they should be provided through an explicit interface.

The C# code already lives in a separate library System.Runtime.InteropServices.JavaScript

Is a bit of a hard problem. Last time a looked, Mono code looked it up by name emitted by Roslyn. That is a hack. Perhaps we could consider asking Roslyn to somehow mark it...

I don't have a solution for that yet. We also need to invoke JS marshaling to get the Task correctly back to JS as a promise

yowl commented 9 months ago

For Javascript support in general.

Jco (https://github.com/bytecodealliance/jco) are adding async/promise support to their wit bindgen, I wonder is there a route there, i.e. make the Javascript a component and consume it that way. Down the road a bit as we don't have full wit support yet.

(Edited as not related to Main, just an observation on work in related spaces)

SingleAccretion commented 9 months ago

In Mono, dotnet.js is build during runtime build and than during app build it's just picked up from runtime pack + emscripten js with build different settings.

What does the emcc command line look like (conceptually)? Emscripten has a number of ways to integrate JS code, which one is used?

Separate question: is the JS minified by default (at runtime build)? In NAOT, we have so far been following the strategy that -c Release does not strip debug info by default.

Separate question: how many of JS APIs depend on things like dynamic assembly loading and dynamic code execution (interpreter)? These would not work on NAOT (naturally).

I don't have a solution for that yet. We also need to invoke JS marshaling to get the Task correctly back to JS as a promise

How does the marshalling of tasks to promises work, in terms of native signatures and data flow? For example, how would the underlying UnmanagedCallersOnly method look for [JsExport] async Task AsyncMethod() { ... }?

maraf commented 9 months ago

What does the emcc command line look like (conceptually)? Emscripten has a number of ways to integrate JS code, which one is used?

We let emscripten generate ES6 module and link in mono functions. Then consume the emscripten module from our (two) modules. Running .\build.cmd -bl -os browser -subset mono+libs -c Debug gives you binlog with all the defails

Separate question: is the JS minified by default (at runtime build)? In NAOT, we have so far been following the strategy that -c Release does not strip debug info by default.

Emscripten JavaScript is unminified, our API generated from Typescript is minified in Release mode, but we have source maps pointing to github.

Separate question: how many of JS APIs depend on things like dynamic assembly loading and dynamic code execution (interpreter)? These would not work on NAOT (naturally).

I going to say "none", at least in the core paths. We have an API for lazy assembly loading, but it's not used by default. We lookup some C# functions with mono reflection, but I was able to bypass that with wasm exports for far.

How does the marshalling of tasks to promises work, in terms of native signatures and data flow? For example, how would the underlying UnmanagedCallersOnly method look for [JsExport] async Task AsyncMethod() { ... }?

An example of Roslyn generated wrapper for [JSExport] internal static async Task<string> GreetToJS(string name)

[global::System.Diagnostics.DebuggerNonUserCode]
[global::System.Runtime.InteropServices.UnmanagedCallersOnlyAttribute(EntryPoint = "_5B_BrowserConsoleApp_5D_Xyz_Interop_2F_MyClass_3A_GreetToJS")]
internal static unsafe void __Wrapper_GreetToJS(global::System.Runtime.InteropServices.JavaScript.JSMarshalerArgument* __arguments_buffer)
{
    string name;
    ref global::System.Runtime.InteropServices.JavaScript.JSMarshalerArgument __arg_exception = ref __arguments_buffer[0];
    ref global::System.Runtime.InteropServices.JavaScript.JSMarshalerArgument __arg_return = ref __arguments_buffer[1];
    global::System.Threading.Tasks.Task<string> __retVal = default;
    // Setup - Perform required setup.
    ref global::System.Runtime.InteropServices.JavaScript.JSMarshalerArgument __name_native__js_arg = ref __arguments_buffer[2];
    // Unmarshal - Convert native data to managed data.
    __name_native__js_arg.ToManaged(out name);
    try
    {
        __retVal = Xyz.Interop.MyClass.GreetToJS(name);
        __arg_return.ToJS(__retVal, static (ref global::System.Runtime.InteropServices.JavaScript.JSMarshalerArgument __task_result_arg, string __task_result) =>
        {
            __task_result_arg.ToJS(__task_result);
        });
    }
    catch (global::System.Exception ex)
    {
        __arg_exception.ToJS(ex);
    }
}
SingleAccretion commented 9 months ago

Thanks you, I think I am seeing the bigger picture now. Some more questions.

It is clear how the support for scenarios where JS is the root of execution work. What about cases where WASM is the root of execution, i. e. '$(NativeLib)' != ''? How do you package the JS runtime depends on for this (both static and non-static scenarios)?

(Currently, NAOT-LLVM depends on incredibly little JS, basically one --js-library)

For the async main, I see it will root a lot of task infrastructure, so definitely not something to be done always (i. e. only when the user requests such by writing it).

Is the user required to attribute async main with [JSExport] explicitly? I can see this working out simply by giving the generated UCO method a well-known EntryPoint, which the runtime and ILC would know about (it ties somewhat to your idea in #2433). The details would need to be figured out, but at least we don't need to modify Roslyn.

maraf commented 9 months ago

It is clear how the support for scenarios where JS is the root of execution work. What about cases where WASM is the root of execution, i. e. '$(NativeLib)' != ''? How do you package the JS runtime depends on for this (both static and non-static scenarios)?

Sorry, I probably don't follow. Do you mean the WASI scenario? All this effort is meant for browser/nodejs/v8 target.

For the async main, I see it will root a lot of task infrastructure, so definitely not something to be done always (i. e. only when the user requests such by writing it).

Definitely. If there isn't an async main, nothing should be rooted.

Is the user required to attribute async main with [JSExport] explicitly? I can see this working out simply by giving the generated UCO method a well-known EntryPoint, which the runtime and ILC would know about (it ties somewhat to your idea in #2433). The details would need to be figured out, but at least we don't need to modify Roslyn.

Yeah, adding [JSExport] attribute explicitly is possible, but it doesn't work with top-level-statements. Ideally I would like to find the method automatically, but I didn't probe it yet. I'm not sure if top-level-statements are "converted" before source generators or after and if Roslyn/msbuild will give me the name of the entrypoint in all cases.

SingleAccretion commented 8 months ago

Sorry, I probably don't follow.

The "shared library" scenario is like in https://devblogs.microsoft.com/dotnet/use-net-7-from-any-javascript-app-in-net-7/.

More interesting is the "static library" case. As you know, one can build static WASM libraries and distribute them to be linked with something later. With browser-wasm, you also have to distribute some JS, since the runtime depends on some JS. Today you can do so with basically one --js-library file. The question is how will that look after this work.

Ideally I would like to find the method automatically, but I didn't probe it yet. I'm not sure if top-level-statements are "converted" before source generators or after and if Roslyn/msbuild will give me the name of the entrypoint in all cases.

Pretty sure source generators will see the original source code for top-level statements. I am not sure how to make that work, even - you cannot call an unspeakable Main from the UCO wrapper, even if you generated one implicitly.

maraf commented 8 months ago

The "shared library" scenario is like in https://devblogs.microsoft.com/dotnet/use-net-7-from-any-javascript-app-in-net-7/.

More interesting is the "static library" case. As you know, one can build static WASM libraries and distribute them to be linked with something later. With browser-wasm, you also have to distribute some JS, since the runtime depends on some JS. Today you can do so with basically one --js-library file. The question is how will that look after this work.

I see, thanks! With dotnet.js you would need to (currently) distribute 3 .js files + 1 .wasm, because dotnet.js is split into 3 ES modules. We have an open issue to support merging those JS files into one.

By linking with something later, do you mean at WebAssembly level or at JavaScript level? If later is the case, you would also want to hide the nature that your library is implemented with dotnet under the hood and would probably want to wrap the API surface. I had a demo of that last year with integration to react https://github.com/maraf/dotnet-wasm-react.

SingleAccretion commented 8 months ago

By linking with something later, do you mean at WebAssembly level

Right, linking it statically, as in emcc myapp_written_in_c.o dotet_library.a .... How would that command line look like.

Connected to this is the question of whether the JS is set up to work correctly if it is not the root of execution, i. e. if the first thing under our control called is a UCO method from dotnet_library.a.

maraf commented 8 months ago

Right, linking it statically, as in emcc myapp_written_in_c.o dotet_library.a .... How would that command line look like.

Connected to this is the question of whether the JS is set up to work correctly if it is not the root of execution, i. e. if the first thing under our control called is a UCO method from dotnet_library.a.

I have never tried that. How does it work with plain emscripten? Who is responsible for downloading and instantiating the wasm module?

SingleAccretion commented 8 months ago

How does it work with plain emscripten? Who is responsible for downloading and instantiating the wasm module?

Static linking merges all of the input files together. You can see it in action if you look at some binlogs from the runtime (Mono or NAOT-LLVM) build, there are a bunch of library.a (archive, a collection of object files) and something.o (object) files with functions in the them that wasm-ld links together.

Emscripten then has a bespoke system for making JS "look like C" in this process: https://emscripten.org/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html#implement-a-c-api-in-javascript.

maraf commented 8 months ago

I see. So you can link together multiple emscripten-based libraries and pass --js-library for every library that needs to provide wasm imports, right? But in the end, you will always get single "emscripten app". That's something we don't support at the moment, since dotnet.js API is responsible for downloading and starting the whole thing, and emscripten module is just one thing inside. We probably could make it work with some changes

SingleAccretion commented 8 months ago

So you can link together multiple emscripten-based libraries and pass --js-library for every library that needs to provide wasm imports, right?

Yes.

That's something we don't support at the moment, since dotnet.js API is responsible for downloading and starting the whole thing, and emscripten module is just one thing inside. We probably could make it work with some changes

I see; that is more or less what I expected.

To tie things a bit to the earlier points, there are three different subsystems in a runtime setup:

1) JS "host code" - downloads WASM, does things like feature detection
  ; Auto-generated by Emscripten in the current NAOT-LLVM
  ; Bespoke in the Mono setup
  ; Invoked by the browser in the application scenario, may be packaged as a JS library itself
2) JS "library code" - access to browser APIs, like crypto or globalization (hence this includes JSImport infrastructure)
  ; Provided by Emscripten (system APIs) + one JS library (for getting random values) in the current NAOT-LLVM
  ; Provided by Emscripten (system APIs) + "host code" in Mono
3) WASM, native and C# code
  ; Invoked by "host code" in the application scenario
  ; Invokes JS "library code"

What is necessary for the library scenario is a very clear separation between 1 and 2 (in the "static" case, 1 doesn't exist at all).

ivanpovazan commented 8 months ago

Just to chime in on

Export user written Main (possibly async) as wasm export (we need to do JS marshaling for it)

Apart from exposing the async Main, I think we would need some additional tweaks (most probably in ILC codegen) in order to properly execute the startup sequence.

Initially NativeAOT supported two startup scenarios:

  1. Startup for libraries - OutputType=Library
    • initializes System.Private.CoreLib and runs module initilizers
  2. Startup for executables - OutputType=Exe
    • initializes System.Private.CoreLib, stores Main command line arguments, stores entrypoint assembly, runs module intializers, calls Main, stores return value and performs teardown

The 3. new scenario was added for integration with Xamarin and support for iOS platforms where we had to introduce a new mode OutputType=Exe, NativeLib=Static, CustomNativeMain=true which is basically a combination of the two from above:

Based on your discussion and proposed requirements around Main, my understanding is that we would need something similar to 3) that would perform the necessary startup, but also provide a way to await on Main, and I don't think it is achievable by just exposing the asynchronous Main method to the native world.

The startup sequence is generated at: https://github.com/dotnet/runtime/blob/cbc501ca196371572c38f8d12a66969864d99c08/src/coreclr/tools/aot/ILCompiler.Compiler/IL/Stubs/StartupCode/StartupCodeMainMethod.cs#L64

maraf commented 8 months ago

Apart from exposing the async Main, I think we would need some additional tweaks (most probably in ILC codegen) in order to properly execute the startup sequence.

If we generate interop wrapper with UCO for the user defined Main method (either by explicitly marking it as [JSExport] or implicitly with some magic), the initialization will happen when we call the wrapper. Is that correct or am I missing anything?

maraf commented 8 months ago

What is necessary for the library scenario is a very clear separation between 1 and 2 (in the "static" case, 1 doesn't exist at all).

Yes. These are already separated in Mono case

  1. "host code" is dotnet.js (first ES module), it's responsible to downloading assets and exposing public API
  2. "library code" is dotnet.native.js (emscripten ES module) + dotnet.runtime.js (our wrapper around raw emscripten JS, eg kwnos how initialize JS imports/exports, marshalling etc)

What we never actualy tried is using 2 without 1 and I think it would require some glue/orchestration code

SingleAccretion commented 8 months ago

Is that correct or am I missing anything?

It is true if you create a NativeLib, but not otherwise (!NativeLib is expected to perform the "managed startup" sequence manually).

There are two ways to think about this: either we have a library with a known entrypoint, or we have an executable with an unusual main.

What design does the latter view lead to Async main scenario is different from the sync main one in only two ways: 1) We need to pass an additional argument (`JSMarshalerArgument*`) 2) We need to do some marshalling before/after calling the async main (as written by the user) itself. So, in the usual scenario we have: ``` Host main(argc, argv) runtime_startup() managed_main(argc, argv) managed_startup(...) args = marshal(argc, argv) user_main(args) ``` In the async main scenario: ``` Host main(argc, argv, JSArgs*) runtime_startup() managed_main(argc, argv, JSArgs*) managed_startup(...) args = marshal(argc, argv) user_main_marshalling_stub(args, JSArgs*) ; Note - this need not NOT be UCO (although it can be, with an unmanaged calli) user_async_main(args) ``` For simplicity, `JSArgs*` could even be "smuggled" in `argv` (or vice versa): ``` Host main(argc, argv) runtime_startup() managed_main(argc, argv) managed_startup(...) args = marshal(argc, argv) user_async_main_marshalling_stub(args, JSArgs*) user_async_main(args) ```

For the the former, it requires a bit more thinking if it can be implemented transparently, given that we don't know if the main will be async or sync when invoking ILC. Export the UCO stub as EntryPoint="main" and do CustomNativeMain=true?

lewing commented 8 months ago

How does it work with plain emscripten? Who is responsible for downloading and instantiating the wasm module?

Static linking merges all of the input files together. You can see it in action if you look at some binlogs from the runtime (Mono or NAOT-LLVM) build, there are a bunch of library.a (archive, a collection of object files) and something.o (object) files with functions in the them that wasm-ld links together.

Emscripten then has a bespoke system for making JS "look like C" in this process: https://emscripten.org/docs/porting/connecting_cpp_and_javascript/Interacting-with-code.html#implement-a-c-api-in-javascript.

This is roughly what WasmNativeFile does, it works in both AOT and iterpreted mode and like AOT even requires the workload so that we have the wasm-ld executable etc. It is also the machinery we use to link in/out optional runtime features as part of WasmBuildNative and how the bindings for things like skiasharp and sqlite work. This is easier to see in the wasi build than the browser target which is more complicated due to some product requirements. We currently expect to drive that build to the final link modulo wasm imports and exports but that could be altered.

lewing commented 8 months ago

What is necessary for the library scenario is a very clear separation between 1 and 2 (in the "static" case, 1 doesn't exist at all).

Yes. These are already separated in Mono case

  1. "host code" is dotnet.js (first ES module), it's responsible to downloading assets and exposing public API
  2. "library code" is dotnet.native.js (emscripten ES module) + dotnet.runtime.js (our wrapper around raw emscripten JS, eg kwnos how initialize JS imports/exports, marshalling etc)

What we never actualy tried is using 2 without 1 and I think it would require some glue/orchestration code

The library mode work that was done in the mono aot compiler for mono's library mode (what @ivanpovazan referred to) is available for us to use in AOT mode which includes the startup stub to init the runtime enough to allow calling into ICO entry points without calling into managed main and I think it will largely work out of the box if we add support to the bundler to handle it. We would need a little more work to generate stubs and/or fix the fallback implementation for the interpreter without the using AOT compiler because that IL is generated from unmanaged code right now.

ivanpovazan commented 8 months ago

If we generate interop wrapper with UCO for the user defined Main method (either by explicitly marking it as [JSExport] or implicitly with some magic), the initialization will happen when we call the wrapper. Is that correct or am I missing anything?

I tried to point out that there is already managed_main exposed, but that function isn't just invoking user defined managed main, as if you would set a UnmanagedCallersOnly attribute on it, but rather in which:

  1. startup/setup is performed (things like storing command line arguments so they are available in the user app via Environment.GetCommandLineArgs are performed in this step)
  2. the actual user managed main is called
  3. teardown is performed

So if we would just try to annotate the user's async Main with JSExport and manually call it from our-custom startup code we might end up missing some functionality which is baked in the step 1. from the above and/or fail to initialize the runtime and/or CoreLib properly.

maxkatz6 commented 3 months ago

@maraf hi! I am interested in trying NativeAOT LLVM with Avalonia, and was wondering if you have any minimal standalone project or doc somewhere, from which I can start? Asking because this "compiling.md" page doesn't seem to include anything about dotnet.js support.

I found https://github.com/maraf/MinimalDotNetWasmNativeAOT/tree/dotnetjs/DotnetJsHack repo. Are these hacks with copying runtime files still required?

From this list of WIP features, I don't see any as a blocker for us. I also assume I also have to use latest previews of .NET 9 SDK (or even nighlies), not just current builds of ILCompiler.LLVM.

maraf commented 3 months ago

Hey! The compiling.md is a good docs on how to setup NativeAOT-LLVM project. The only thing needed for dotnet.js is to set DotnetJsApi=true https://github.com/dotnet/runtimelab/blob/feature%2FNativeAOT-LLVM/src%2Ftests%2Fnativeaot%2FSmokeTests%2FDotnetJs%2FDotnetJs.csproj#L7 . This smoke test also shows what is +/- possible at the moment.

If you want to target 9.0.0-* of NativeAOT-LLVM packages you need some .NET 9 SDK preview. Having later one is generally better

maxkatz6 commented 1 month ago

@maraf it seems ReleaseJSOwnedObjectByGCHandle isn't yet supported. After a while with app running I have:

Uncaught Error: Missing wasm export '_System_Runtime_InteropServices_JavaScript_JavaScriptExports_ReleaseJSOwnedObjectByGCHandle' (for System.Runtime.InteropServices.JavaScript.JavaScriptExports.ReleaseJSOwnedObjectByGCHandle)
    at dotnet.runtime.js:3:23922
    at Zt (dotnet.runtime.js:3:23460)
    at dotnet.runtime.js:3:34991
    at vr (dotnet.runtime.js:3:35059)
    at Tr (dotnet.runtime.js:3:35231)
    at FinalizationRegistry.cleanupSome (<anonymous>)