bytecodealliance / wasmtime-dotnet

.NET embedding of Wasmtime https://bytecodealliance.github.io/wasmtime-dotnet/
Apache License 2.0
409 stars 52 forks source link

Initial async work #279

Open martindevans opened 10 months ago

martindevans commented 10 months ago

This PR adds support for async-wasm (see async.h).

Async support was added to the C API in https://github.com/bytecodealliance/wasmtime/pull/7106 which has not yet made it into a main wasmtime release, so testing this will require new DLLs until that's released.

Main Changes:

Config

Changes here are quite basic, just some new config properties to enable and configure async mode (e.g. WithAsync(bool)). Once async has been enabled you must use Linker.InstantiateAsync to create a module.

Linker

Added InstantiateAsync which creates an AsyncInstance. Also added generation for all the DefineAsyncFunction overloads.

AsyncInstance

An Instance which was created in async mode. This only exposes the new async calls, so it can't be used wrong (e.g. trying to call in non-async mode).

AsyncFunction

A Function which is async. This can currently only be invoked by wrapping it, we'll probably want to introduce an equivalent of object? Invoke(ReadOnlySpan<ValueBox> arguments) later.

martindevans commented 10 months ago

CI is failing since I didn't include updated binaries. I have no idea how the macos-release one has managed to pass :/

kpreisser commented 10 months ago

Do you know if the .NET CLR is compatible with that? Otherwise, I'm not sure if it's possible to correctly implement support for async Wasmtime functions in wasmtime-dotnet.

I found this on the .NET documentation on Managed and unmanaged threading in Windows:

The .NET threading model does not support fibers. You should not call into any unmanaged function that is implemented by using fibers. Such calls may result in a crash of the .NET runtime.

Unfortunatly, with this it might not be possible to implement async support in wasmtime-dotnet.

martindevans commented 10 months ago

The .NET threading model does not support fibers. You should not call into any unmanaged function that is implemented by using fibers. Such calls may result in a crash of the .NET runtime.

Unfortunatly, with this it might not be possible to implement async support in wasmtime-dotnet.

That would be very unfortunate, but it does look like you're right.

I'm a little bit surprised my tests aren't flakey. I guess since they're only ever running one async call at a time there's no real stack swapping going on.

martindevans commented 10 months ago

it will cause the process to crash with an Access Violation on Windows when wasmtime_call_future_poll is called

Does it reliably crash for you when you do this? I've been using the debugger while developing this and have had the tests passing.

martindevans commented 10 months ago

I've just added a test with multiple async calls and that seems to crash every time (SEHException).

kpreisser commented 10 months ago

Does it reliably crash for you when you do this?

Yes, when I set a breakpoint in the async callback in the InvokeAsyncNoArgsNoResults test, like this: grafik

Then when I use "Debug" on this test, the breakpoint doesn't get hit, and it always crashes with an SEHException and then an access violation:

Exception thrown: 'System.Runtime.InteropServices.SEHException' in Unknown Module.
The program '[15980] testhost.exe' has exited with code 3221225477 (0xc0000005) 'Access violation'.

When I remove the breakpoint and use Debug on this test, it doesn't crash; which I think is probably a result of the async callback being called using fibers.

martindevans commented 10 months ago

How do we want to proceed with this?

We could work on getting it added as a non-Windows only feature, but that just seems like it would be confusing for users.

@peterhuene do you know if wasmtime fibers on Windows are likely to move away from using Windows native fibers? It looks like some features are missing because of native fibers (e.g. custom stack allocation) so I would guess the long term ambition is to move away from native-fibers anyway?

kpreisser commented 10 months ago

We could work on getting it added as a non-Windows only feature, but that just seems like it would be confusing for users.

I tried using VS Code (with the C# Dev Kit Extension) on Ubuntu 22.04 with .NET SDK 7.0.110, and then set two breakpoints in the async callback in the InvokeAsyncNoArgsNoResults like this, and then used "Debug Test" on that test:

grafik

Then, the debugger seemed to pause but didn't stop at the breakpoint, and it couldn't show the current thread's call stack (single stepping failed as well):

grafik

When continuing execution, the second breakpoint was then hit (as that code is run by a regular worker thread of the .NET ThreadPool), and the test passed afterwards. grafik

So maybe at least the debug experience on Linux also doesn't fully support the stack switching when the async callback is called.

peterhuene commented 10 months ago

@peterhuene do you know if wasmtime fibers on Windows are likely to move away from using Windows native fibers? It looks like some features are missing because of native fibers (e.g. custom stack allocation) so I would guess the long term ambition is to move away from native-fibers anyway?

I don't think moving away from native fibers on Windows is on anyone's radar to implement and I can't recall if there were a technical reason we are using the native fiber implementation on Windows or if it was used simply because it was available at the time. @alexcrichton might recall as the implementer.

I do wonder if the CLR might not appreciate the stack being switched out from under it even with a custom fiber implementation; I think we'd have to dig deeper into exactly what's driving the limitation from the CLR side before using the limitation as justification for moving away from the native fiber implementation on Windows.

alexcrichton commented 10 months ago

I was personally under the impression that there's no safe and future-compatible way to do stack switching on Windows except for the native fibers API that's provided in userspace. Given that it's not even an option to move away from native fibers because we want to correctly operate on Windows.

That being said I'm no Windows expert, so if this assumption is false then we can definitely move away from native fibers. I don't know what, if anything, other projects are doing with respect to native stack switching on Windows.

kpreisser commented 10 months ago

I don't have much knowledge in low-level functionality like this, but maybe one reason that the .NET threading model doesn't support fibers on Windows is due to how the GC works regarding object references on the stack.

For example, when running the following code based on this PR on Windows:

void Run()
{
    using var config = new Config().WithAsync(true);
    using var engine = new Engine(config);

    using var module = Module.FromText(
        engine,
        "hello",
        """
        (module 
            (func $hello (import "" "hello") (result i32))
            (func (export "run") (result i32)
                call $hello
            )
        )
        """);

    using var linker = new Linker(engine);
    using var store = new Store(engine);

    var cleanUpThread = new Thread(() =>
    {
        for (int i = 0; i < 10; i++)
        {
            Thread.Sleep(100);
            GC.Collect();
            GC.WaitForPendingFinalizers();
        }
    });
    cleanUpThread.Start();

    linker.DefineFunction(
        "",
        "hello",
        () =>
        {
            cleanUpThread.Join();
            Console.WriteLine("Callback 1! Wait completed.");
            return 12345;
        });

    var instance = linker.InstantiateAsync(store, module).GetAwaiter().GetResult();
    var run = instance.GetFunction("run")!.WrapFunc<int>()!;

    var obj = new MyCollectibleObj();

    var runTask = run();
    int result = runTask.GetAwaiter().GetResult();
    Console.WriteLine("After await runTask. MyCollectibleObj shouldn't have been collected until now.");

    GC.KeepAlive(obj);
}

Run();

class MyCollectibleObj
{
    public MyCollectibleObj()
    {
        Console.WriteLine("Created MyCollectibleObj.");
    }
    ~MyCollectibleObj()
    {
        Console.WriteLine("Collected MyCollectibleObj!");
    }
}

then even without running under a debugger, the app will crash most of the time with various errors (like "Fatal error. Internal CLR error.", "AccessViolationException"), and from the console output you can see that the GC has collected the MyCollectibleObj instance even though the code keeps it alive (on the stack) during the call, and so shouldn't have been collected until awaiting the runTask. It seems due to the stack switch, the GC isn't able to track that the object is still alive.

I can't reproduce this on Ubuntu 22.04, so maybe the GC in .NET (or the stack switching) is working differently there.