bytecodealliance / wasmtime-dotnet

.NET embedding of Wasmtime https://bytecodealliance.github.io/wasmtime-dotnet/
Apache License 2.0
410 stars 52 forks source link

Keep the `Store` alive until its `StoreContext` is no longer used #206

Closed kpreisser closed 1 year ago

kpreisser commented 1 year ago

Keep the Store alive until its StoreContext is no longer used.

As noted in https://github.com/bytecodealliance/wasmtime-dotnet/issues/200#issuecomment-1355191511, this is required (to handle the case when you forget to dispose the Store) as otherwise it could theoretically happen that the Store handle is deleted (with wasmtime_store_delete) in the GC finalizer thread even when at the same time a native call using the StoreContext handle is still executing in another thread, in case the Store object is no longer reachable.

For native methods taking a handle parameter that is passed as SafeHandle, this is not required as the SafeHandle is already kept alive during the call. An exception is if you use SafeHandle.DangerousGetHandle() to retrieve the handle as pointer value and pass it; in that case you must also keep the SafeHandle alive.

This commit also fixes an instance of the above noted issue in Engine.IncrementEpoch(), where wasmtime_engine_increment_epoch was declared as taking an IntPtr handle, and Engine.IncrementEpoch() used SafeHandle.DangerousGetHandle() without keeping the SafeHandle alive (this appears to have been introduced with #118).

Note that I also added the GC.KeepAlive(store) if the store is used in method calls after that, because otherwise it is not guaranteed that the following call will actually keep the store alive (the call could be inlined, and if then doesn't read the Store, it would still be eligible for GC); additionally, this would cause a "code debt" if you edit the code in the future, as e.g. when you would change/remove the code after the call using the StoreContext, you would have to remember to add the GC.KeepAlive(store) call.


Note: A general question that came to my mind, is whether Wasmtime actually has a thread-safe resource management, i.e. whether it supports that one thread may be freeing a resource, while at the same time another thread is calling a native method that uses another value that internally might contain a reference to the resource freed by the first thread.

For example, there is the Engine object (wasm_engine_t) that you need to create first, and then you can create a Store (wasmtime_store_t*) from the Engine (as wasmtime_store_new() takes a wasm_engine_t* parameter), from which I understand that internally, the returned wasmtime_store_t may contain a reference to the wasm_engine_t. (So, if you call wasm_engine_delete to delete the engine while a store still exists, this will not actually free the wasm_engine_t but just decrements its reference count. If you later delete the wasmtime_store_t, then it and the referenced engine will actually be freed.)

However, if in .NET code you create an Engine, and then create a Store from the Engine and then throw the Engine away (and you forget to use a using block or to call Engine.Dispose() afterwards), and later you call Store.Dispose(), it can happen that wasm_engine_delete() is called from the GC finalizer thread, while at the same time your main thread is calling wasmtime_store_delete(). Is this supported by Wasmtime?

(If not, them I'm wondering whether the apporach of releasing handles in finalizers can actually be supported at all by wasmtime-dotnet.)

What do you think?

Thanks!

peterhuene commented 1 year ago

Apologies on the delay for reviewing this as I was on vacation in December and I've been focusing a lot on tooling unrelated to the .NET bindings lately.

I'll read up on the context and review the changes shortly.

peterhuene commented 1 year ago

A general question that came to my mind, is whether Wasmtime actually has a thread-safe resource management, i.e. whether it supports that one thread may be freeing a resource, while at the same time another thread is calling a native method that uses another value that internally might contain a reference to the resource freed by the first thread.

Wasmtime follows Rust's threading model where Store<T> (here T represents the user context data) is Send if T is Send and Sync if T is Sync, meaning that the context type determines whether or not the store can be sent and synchronized between threads.

It further relies on Rust to maintain a guarantee that Store-related objects cannot outlive the Store itself; this is accomplished by requiring a reference to the store be passed into any operation on a Store-related object; thus the Rust compiler can statically verify that a Store outlives its use by the Store-related objects.

Some types, like Engine and Module, are inherently Send and Sync and can be used safely from multiple threads; this includes deleting them as references to these are maintained via a synchronized reference count. Basically the only thing the .NET API should be doing from the finalizer thread is decrementing these reference counts.

The real problem with the .NET API is that it doesn't uphold the same guarantees with Store:

There's two possible solutions in my mind for the second issue:

I haven't yet looked over the PR, but I am, on the surface, wary of injecting GC.KeepAlive in lots of places. I'll have to dig deeper and see what's necessary.

peterhuene commented 1 year ago

I followed up with #200 to say that I think that storing a weak self-reference in the store context data, as you originally described, would be a good path to move forward on for fixing the issue with Caller retrieved objects being tied to the lifetime of the call and having the least impact on the current API.

I'm going to move forward with this review to fix the store lifetime issues at interop call sites.