Open martindevans opened 10 months ago
CI is failing since I didn't include updated binaries. I have no idea how the macos-release one has managed to pass :/
Do you know if the .NET CLR is compatible with that? Otherwise, I'm not sure if it's possible to correctly implement support for async Wasmtime functions in
wasmtime-dotnet
.
I found this on the .NET documentation on Managed and unmanaged threading in Windows:
The .NET threading model does not support fibers. You should not call into any unmanaged function that is implemented by using fibers. Such calls may result in a crash of the .NET runtime.
Unfortunatly, with this it might not be possible to implement async support in wasmtime-dotnet
.
The .NET threading model does not support fibers. You should not call into any unmanaged function that is implemented by using fibers. Such calls may result in a crash of the .NET runtime.
Unfortunatly, with this it might not be possible to implement async support in wasmtime-dotnet.
That would be very unfortunate, but it does look like you're right.
I'm a little bit surprised my tests aren't flakey. I guess since they're only ever running one async call at a time there's no real stack swapping going on.
it will cause the process to crash with an Access Violation on Windows when wasmtime_call_future_poll is called
Does it reliably crash for you when you do this? I've been using the debugger while developing this and have had the tests passing.
I've just added a test with multiple async calls and that seems to crash every time (SEHException).
Does it reliably crash for you when you do this?
Yes, when I set a breakpoint in the async callback in the InvokeAsyncNoArgsNoResults
test, like this:
Then when I use "Debug" on this test, the breakpoint doesn't get hit, and it always crashes with an SEHException
and then an access violation:
Exception thrown: 'System.Runtime.InteropServices.SEHException' in Unknown Module.
The program '[15980] testhost.exe' has exited with code 3221225477 (0xc0000005) 'Access violation'.
When I remove the breakpoint and use Debug on this test, it doesn't crash; which I think is probably a result of the async callback being called using fibers.
How do we want to proceed with this?
We could work on getting it added as a non-Windows only feature, but that just seems like it would be confusing for users.
@peterhuene do you know if wasmtime fibers on Windows are likely to move away from using Windows native fibers? It looks like some features are missing because of native fibers (e.g. custom stack allocation) so I would guess the long term ambition is to move away from native-fibers anyway?
We could work on getting it added as a non-Windows only feature, but that just seems like it would be confusing for users.
I tried using VS Code (with the C# Dev Kit Extension) on Ubuntu 22.04 with .NET SDK 7.0.110, and then set two breakpoints in the async callback in the InvokeAsyncNoArgsNoResults
like this, and then used "Debug Test" on that test:
Then, the debugger seemed to pause but didn't stop at the breakpoint, and it couldn't show the current thread's call stack (single stepping failed as well):
When continuing execution, the second breakpoint was then hit (as that code is run by a regular worker thread of the .NET ThreadPool), and the test passed afterwards.
So maybe at least the debug experience on Linux also doesn't fully support the stack switching when the async callback is called.
@peterhuene do you know if wasmtime fibers on Windows are likely to move away from using Windows native fibers? It looks like some features are missing because of native fibers (e.g. custom stack allocation) so I would guess the long term ambition is to move away from native-fibers anyway?
I don't think moving away from native fibers on Windows is on anyone's radar to implement and I can't recall if there were a technical reason we are using the native fiber implementation on Windows or if it was used simply because it was available at the time. @alexcrichton might recall as the implementer.
I do wonder if the CLR might not appreciate the stack being switched out from under it even with a custom fiber implementation; I think we'd have to dig deeper into exactly what's driving the limitation from the CLR side before using the limitation as justification for moving away from the native fiber implementation on Windows.
I was personally under the impression that there's no safe and future-compatible way to do stack switching on Windows except for the native fibers API that's provided in userspace. Given that it's not even an option to move away from native fibers because we want to correctly operate on Windows.
That being said I'm no Windows expert, so if this assumption is false then we can definitely move away from native fibers. I don't know what, if anything, other projects are doing with respect to native stack switching on Windows.
I don't have much knowledge in low-level functionality like this, but maybe one reason that the .NET threading model doesn't support fibers on Windows is due to how the GC works regarding object references on the stack.
For example, when running the following code based on this PR on Windows:
void Run()
{
using var config = new Config().WithAsync(true);
using var engine = new Engine(config);
using var module = Module.FromText(
engine,
"hello",
"""
(module
(func $hello (import "" "hello") (result i32))
(func (export "run") (result i32)
call $hello
)
)
""");
using var linker = new Linker(engine);
using var store = new Store(engine);
var cleanUpThread = new Thread(() =>
{
for (int i = 0; i < 10; i++)
{
Thread.Sleep(100);
GC.Collect();
GC.WaitForPendingFinalizers();
}
});
cleanUpThread.Start();
linker.DefineFunction(
"",
"hello",
() =>
{
cleanUpThread.Join();
Console.WriteLine("Callback 1! Wait completed.");
return 12345;
});
var instance = linker.InstantiateAsync(store, module).GetAwaiter().GetResult();
var run = instance.GetFunction("run")!.WrapFunc<int>()!;
var obj = new MyCollectibleObj();
var runTask = run();
int result = runTask.GetAwaiter().GetResult();
Console.WriteLine("After await runTask. MyCollectibleObj shouldn't have been collected until now.");
GC.KeepAlive(obj);
}
Run();
class MyCollectibleObj
{
public MyCollectibleObj()
{
Console.WriteLine("Created MyCollectibleObj.");
}
~MyCollectibleObj()
{
Console.WriteLine("Collected MyCollectibleObj!");
}
}
then even without running under a debugger, the app will crash most of the time with various errors (like "Fatal error. Internal CLR error.", "AccessViolationException"), and from the console output you can see that the GC has collected the MyCollectibleObj
instance even though the code keeps it alive (on the stack) during the call, and so shouldn't have been collected until awaiting the runTask
. It seems due to the stack switch, the GC isn't able to track that the object is still alive.
I can't reproduce this on Ubuntu 22.04, so maybe the GC in .NET (or the stack switching) is working differently there.
This PR adds support for async-wasm (see async.h).
Async support was added to the C API in https://github.com/bytecodealliance/wasmtime/pull/7106 which has not yet made it into a main wasmtime release, so testing this will require new DLLs until that's released.
Main Changes:
Config
Changes here are quite basic, just some new config properties to enable and configure async mode (e.g.
WithAsync(bool)
). Once async has been enabled you must useLinker.InstantiateAsync
to create a module.Linker
Added
InstantiateAsync
which creates anAsyncInstance
. Also added generation for all theDefineAsyncFunction
overloads.AsyncInstance
An
Instance
which was created in async mode. This only exposes the new async calls, so it can't be used wrong (e.g. trying to call in non-async mode).AsyncFunction
A
Function
which is async. This can currently only be invoked by wrapping it, we'll probably want to introduce an equivalent ofobject? Invoke(ReadOnlySpan<ValueBox> arguments)
later.