Keep the `Store` alive until its `StoreContext` is no longer used

kpreisser commented 1 year ago

Keep the Store alive until its StoreContext is no longer used.

As noted in https://github.com/bytecodealliance/wasmtime-dotnet/issues/200#issuecomment-1355191511, this is required (to handle the case when you forget to dispose the Store) as otherwise it could theoretically happen that the Store handle is deleted (with wasmtime_store_delete) in the GC finalizer thread even when at the same time a native call using the StoreContext handle is still executing in another thread, in case the Store object is no longer reachable.

For native methods taking a handle parameter that is passed as SafeHandle, this is not required as the SafeHandle is already kept alive during the call. An exception is if you use SafeHandle.DangerousGetHandle() to retrieve the handle as pointer value and pass it; in that case you must also keep the SafeHandle alive.

This commit also fixes an instance of the above noted issue in Engine.IncrementEpoch(), where wasmtime_engine_increment_epoch was declared as taking an IntPtr handle, and Engine.IncrementEpoch() used SafeHandle.DangerousGetHandle() without keeping the SafeHandle alive (this appears to have been introduced with #118).

Note that I also added the GC.KeepAlive(store) if the store is used in method calls after that, because otherwise it is not guaranteed that the following call will actually keep the store alive (the call could be inlined, and if then doesn't read the Store, it would still be eligible for GC); additionally, this would cause a "code debt" if you edit the code in the future, as e.g. when you would change/remove the code after the call using the StoreContext, you would have to remember to add the GC.KeepAlive(store) call.

Note: A general question that came to my mind, is whether Wasmtime actually has a thread-safe resource management, i.e. whether it supports that one thread may be freeing a resource, while at the same time another thread is calling a native method that uses another value that internally might contain a reference to the resource freed by the first thread.

For example, there is the Engine object (wasm_engine_t) that you need to create first, and then you can create a Store (wasmtime_store_t*) from the Engine (as wasmtime_store_new() takes a wasm_engine_t* parameter), from which I understand that internally, the returned wasmtime_store_t may contain a reference to the wasm_engine_t. (So, if you call wasm_engine_delete to delete the engine while a store still exists, this will not actually free the wasm_engine_t but just decrements its reference count. If you later delete the wasmtime_store_t, then it and the referenced engine will actually be freed.)

However, if in .NET code you create an Engine, and then create a Store from the Engine and then throw the Engine away (and you forget to use a using block or to call Engine.Dispose() afterwards), and later you call Store.Dispose(), it can happen that wasm_engine_delete() is called from the GC finalizer thread, while at the same time your main thread is calling wasmtime_store_delete(). Is this supported by Wasmtime?

(If not, them I'm wondering whether the apporach of releasing handles in finalizers can actually be supported at all by wasmtime-dotnet.)

What do you think?

Thanks!

peterhuene commented 1 year ago

Apologies on the delay for reviewing this as I was on vacation in December and I've been focusing a lot on tooling unrelated to the .NET bindings lately.

I'll read up on the context and review the changes shortly.

peterhuene commented 1 year ago

A general question that came to my mind, is whether Wasmtime actually has a thread-safe resource management, i.e. whether it supports that one thread may be freeing a resource, while at the same time another thread is calling a native method that uses another value that internally might contain a reference to the resource freed by the first thread.

Wasmtime follows Rust's threading model where Store<T> (here T represents the user context data) is Send if T is Send and Sync if T is Sync, meaning that the context type determines whether or not the store can be sent and synchronized between threads.

It further relies on Rust to maintain a guarantee that Store-related objects cannot outlive the Store itself; this is accomplished by requiring a reference to the store be passed into any operation on a Store-related object; thus the Rust compiler can statically verify that a Store outlives its use by the Store-related objects.

Some types, like Engine and Module, are inherently Send and Sync and can be used safely from multiple threads; this includes deleting them as references to these are maintained via a synchronized reference count. Basically the only thing the .NET API should be doing from the finalizer thread is decrementing these reference counts.

The real problem with the .NET API is that it doesn't uphold the same guarantees with Store:

There's no way to automatically enforce safety for store context data (i.e. the T from above); if users supply their own context data then they must ensure it can be sent between threads if they send the Store between threads and synchronized if more than one thread can mutate the context data. This can only be solved with documentation/guidance from the .NET API.
The .NET API doesn't uphold the same guarantee regarding the use of Store-related objects that the Rust API does; Store-related objects in the .NET API hold strong references on the Store they come from (albeit in a way that makes it not possible to keep around Caller-related objects past a host call, currently). The upside of this is that it makes the API more ergonomic in terms of not having to pass a reference to the Store for every operation on a Store-related object. The downside is that it is possible to explicitly Dispose a Store while it is still in use by a related object, which is unsafe.

There's two possible solutions in my mind for the second issue:

We simply document that users of the API must guarantee that Store is not disposed until after every Store-related object is no longer usable. It's not ideal, but also not the most arduous of requirements given the .NET API is effectively an interop shim. For most use cases, I imagine this is very easy to maintain as one would create a Store, interact with it, and then dispose it without involving other threads.
Go back to explicitly accepting the Store to use for operations on Store-related objects and no longer keep a strong reference on the Store from the related objects. This guarantees that the Store remains alive for the duration of the requested operation. While not as ergonomic, this would more closely model how things are done in Wasmtime itself.

I haven't yet looked over the PR, but I am, on the surface, wary of injecting GC.KeepAlive in lots of places. I'll have to dig deeper and see what's necessary.

peterhuene commented 1 year ago

I followed up with #200 to say that I think that storing a weak self-reference in the store context data, as you originally described, would be a good path to move forward on for fixing the issue with Caller retrieved objects being tied to the lifetime of the call and having the least impact on the current API.

I'm going to move forward with this review to fix the store lifetime issues at interop call sites.

bytecodealliance / wasmtime-dotnet

Keep the `Store` alive until its `StoreContext` is no longer used #206