Improved Test Coverage.

martindevans commented 1 year ago

Added some more tests and fixed one small issue in the process:

Modified wasmtime_engine_increment_epoch to return void instead of IntPtr. Based on signature from these docs: https://docs.wasmtime.dev/c-api/engine_8h.html

Proposal

While writing the tests for the config I also came across an awkward error to do with disabling bulk memory. As documented here if Bulk memory = false then it is required that Reference Type = false. If that is not the case then creating the engine throws this exception: System.Runtime.InteropServices.SEHException: 'External component has thrown an exception.', with absolutely no indication of what is wrong.

I would like to modify WithBulkMemory to look like this:

public Config WithBulkMemory(bool enable)
{
    Native.wasmtime_config_wasm_bulk_memory_set(handle, enable);
    return WithReferenceTypes(false);
}

i.e. disabling bulk memory automatically disables reference types.

martindevans commented 1 year ago

CI failure is due to a 404 on https://github.com/bytecodealliance/wasmtime/releases/download/dev/wasmtime-dev-aarch64-macos-c-api.tar.xz

kpreisser commented 1 year ago

creating the engine throws this exception: System.Runtime.InteropServices.SEHException: 'External component has thrown an exception.', with absolutely no indication of what is wrong.

Note that it does actually print an error description to stderr (but that information is not available on the SEHException):

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: feature 'reference_types' requires 'bulk_memory' to be enabled', crates\c-api\src\engine.rs:33:38
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I'm not quite sure (I don't know Rust), but this behavior (directly printing error information to stderr) seems to suggest that these sort of errors/panicks were actually intended to terminate the process, instead of being catchable in .NET code? (I guess that's what happens e.g. on Linux, but haven't yet tried it)

Additionally, if you catch that SEHException, then later when trying to dispose the Config, sometimes it even throws a Access Violation (though that might be caused by the Config handle no longer being valid after calling wasm_engine_new_with_config, but config.NativeHandle.SetHandleAsInvalid() is only called after that was successful):

Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Repeat 2 times:
--------------------------------
   at Wasmtime.Config+Native.wasm_config_delete(IntPtr)
--------------------------------
   at Wasmtime.Config+Handle.ReleaseHandle()
   at System.Runtime.InteropServices.SafeHandle.InternalRelease(Boolean)
   at System.Runtime.InteropServices.SafeHandle.Dispose(Boolean)
   at System.Runtime.InteropServices.SafeHandle.Dispose()
   at Wasmtime.Config.Dispose()
   at Program.<<Main>$>g__TestWastWasmFunctionPerformance|0_1()
   at Program.<Main>$(System.String[])

peterhuene commented 1 year ago

@kpreisser you're correct that a Rust panic in this case is meant to terminate the process.

I don't think we need to guard against configuration validation errors in the .NET version of Config.

To fix that properly, I suggest filing an upstream issue in wasmtime to add a fallible version of wasm_engine_new and wasm_engine_new_with_config (e.g. wasmtime_engine_new and wasmtime_engine_new_with_config) so that bindings can surface the error in their API rather than having the Wasmtime C API unwrap the error and panic.

That way we can surface the error as a proper exception on the .NET side and not have to duplicate configuration validation logic in the .NET bindings.

peterhuene commented 1 year ago

@martindevans thanks for improving the test coverage!

kpreisser commented 1 year ago

Hi @peterhuene ,

@kpreisser you're correct that a Rust panic in this case is meant to terminate the process.

do we need to do something about this? Currently, on Windows, when a Rust panic occurs in Wasmtime that is actually meant to terminate the process (due to an unrecoverable error as I understand it), .NET code can still catch it e.g. with a catch (Exception) clause and continue to run, which looks to me like that we may now run into undefined behavior with possible security implications, whereas actually the process should have been terminated to stay on the safe side. (On Linux for example, the process actually terminates because .NET cannot catch the panic.)

However, I'm not sure that would be the best way to fix this. A naive way to handle this in .NET code in wasmtime-dotnet might be to wrap native functions with code that terminates the process on a SEHException, e.g.

public static unsafe IntPtr wasmtime_func_call(IntPtr context, in ExternFunc func, Value* args, nuint nargs, Value* results, nuint nresults, out IntPtr trap)
{
    try
    {
        return LocalFunc(context, func, args, nargs, results, nresults, out trap);
    }
    catch (SEHException ex)
    {
        // This is likely a Rust panic, so terminate the process.
        Environment.FailFast(ex.Message, ex);
        // Satisfy CFA, this line is not reached.
        throw;
    }

    [DllImport(Engine.LibraryName)]
    static unsafe extern IntPtr LocalFunc(IntPtr context, in ExternFunc func, Value* args, nuint nargs, Value* results, nuint nresults, out IntPtr trap);
}

(or using an exception filter with a when (...) clause to prevent stack unwinding between the .NET stack frame and the Rust stack frame where the panic occurs.)

However, this would theoretically be needed for every native function, and might make the code ugly. (This could be done instead with a Source Generator similar to the one for the LibraryImportAttribute that generates the implementation for a partial defined function, which would hide the boilerplate code, but it might require some effort to write such a source generator.)

Though, maybe this can or should also be fixed directly in Wasmtime or Rust. I saw that with PRs rust-lang/rust#26569 and rust-lang/rust#32900, Rust by default seems to use SEH's RaiseException (or _CxxThrowException from the MSVC runtime) on Win32 for raising an exception also when a panic occurs. However, because .NET apparently uses the same mechanism for .NET exceptions, such an SEH exception can be caught in .NET code.

Alternatively there is RaiseFailFastException, which bypasses all exception handlers and ensures the process is terminated (I assume this is also what .NET internally does in Environment.FailFast()). From the Rust PR, it seems that there is another panic mode (abort) which would have a similar effect to calling RaiseFailFastException.

Do you know why Wasmtime/Rust currently don't use a mode that will abort the process for a panic on Windows? Thanks!

peterhuene commented 1 year ago

do we need to do something about this?

I consider catching (and potentially eating) Exception and SEHException to be anti-patterns to the degree that you will need to accept undefined behavior if you decide to do so. Wasmtime will not generate an SEH exception other than a panic or a serious bug in the runtime that should crash the process. Therefore I don't think there's anything to do in the .NET bindings itself.

That said, I do think changing the panic mode for the C API to abort would make the most sense to address this issue. Would you mind opening an issue in the Wasmtime repo to explore doing so and reference this conversation?

Additionally, I would like to see is less unwrap calls in the C API for supposedly infallible calls and thus less panics easily triggered by user code. To make this happen, we'll need more wasmtime_* overloads of the wasm_* APIs (e.g. wasmtime_engine_new and wasmtime_engine_new_with_config) that return errors rather than void.

kpreisser commented 1 year ago

Thanks for your reply!

I consider catching Exception and SEHException to be anti-patterns to the degree that you will need to accept undefined behavior if you decide to do so.

Note that sometimes you may have to catch (Exception ex) for some kind of transforming/processing, and you don't know which exception types the code is possibly throwing. For example, that's exactly what wasmtime-dotnet does in callback handling:

https://github.com/bytecodealliance/wasmtime-dotnet/blob/972ef243be61ae60bf231fd22efd539456fbd819/src/Function.cs#L1959-L1962

Imagine there are host-to-wasm and wasm-to-host transitions on the stack, and you call a Wasmtime function that panics, resulting in a SEHException on Windows. Even if the user code on top of the wasm-to-host transition doesn't have a catch (Exception) clause, the SEHException will be caught by Function's callback handler and transformed into a wasm_trap_t*, which is then reported at the managed-to-native transition (like wasmtime_call_func) as wasmtime_error_t*, and that is thrown in .NET code as a WasmtimeException. So even with a regular catch (WasmtimeException ...) e.g. around Function.Invoke()) (since it is expected that a WasmtimeException may be thrown at such place), you would catch the Rust panic that was actually intended to terminate the process in this scenario. :innocent:

(Also, note that .NET has a concept of "corrupted state exceptions" that cannot be caught with a try/catch block and will always terminate the process when being thrown by the CLR (like StackOverflowException or AccessViolationException); but SEHException doesn't seem to fall into that category.)

That said, I do think changing the panic mode for the C API to abort would make the most sense to address this issue. Would you mind opening an issue in the Wasmtime repo to explore doing so and reference this conversation?

Agreed, that would certainly be a better option than trying to work-around this in wasmtime-dotnet. I'll look into opening a wasmtime issue. Edit: Opened bytecodealliance/wasmtime#5399.

Additionally, I would like to see is less unwrap calls in the C API for supposedly infallible calls and thus less panics easily triggered by user code.

👍 (my comment was mostly generally about possible Rust panics in wasmtime, not specifically to the panic caused by a wrong config option).

Thank you!

bytecodealliance / wasmtime-dotnet

Improved Test Coverage. #192

Proposal