Closed kpreisser closed 1 year ago
Apologies on the delay for reviewing this as I was on vacation in December and I've been focusing a lot on tooling unrelated to the .NET bindings lately.
I'll read up on the context and review the changes shortly.
A general question that came to my mind, is whether Wasmtime actually has a thread-safe resource management, i.e. whether it supports that one thread may be freeing a resource, while at the same time another thread is calling a native method that uses another value that internally might contain a reference to the resource freed by the first thread.
Wasmtime follows Rust's threading model where Store<T>
(here T
represents the user context data) is Send
if T
is Send
and Sync
if T
is Sync
, meaning that the context type determines whether or not the store can be sent and synchronized between threads.
It further relies on Rust to maintain a guarantee that Store
-related objects cannot outlive the Store
itself; this is accomplished by requiring a reference to the store be passed into any operation on a Store
-related object; thus the Rust compiler can statically verify that a Store
outlives its use by the Store
-related objects.
Some types, like Engine
and Module
, are inherently Send
and Sync
and can be used safely from multiple threads; this includes deleting them as references to these are maintained via a synchronized reference count. Basically the only thing the .NET API should be doing from the finalizer thread is decrementing these reference counts.
The real problem with the .NET API is that it doesn't uphold the same guarantees with Store
:
There's no way to automatically enforce safety for store context data (i.e. the T
from above); if users supply their own context data then they must ensure it can be sent between threads if they send the Store
between threads and synchronized if more than one thread can mutate the context data. This can only be solved with documentation/guidance from the .NET API.
The .NET API doesn't uphold the same guarantee regarding the use of Store
-related objects that the Rust API does; Store
-related objects in the .NET API hold strong references on the Store
they come from (albeit in a way that makes it not possible to keep around Caller
-related objects past a host call, currently). The upside of this is that it makes the API more ergonomic in terms of not having to pass a reference to the Store
for every operation on a Store
-related object. The downside is that it is possible to explicitly Dispose
a Store
while it is still in use by a related object, which is unsafe.
There's two possible solutions in my mind for the second issue:
We simply document that users of the API must guarantee that Store
is not disposed until after every Store
-related object is no longer usable. It's not ideal, but also not the most arduous of requirements given the .NET API is effectively an interop shim. For most use cases, I imagine this is very easy to maintain as one would create a Store
, interact with it, and then dispose it without involving other threads.
Go back to explicitly accepting the Store
to use for operations on Store
-related objects and no longer keep a strong reference on the Store
from the related objects. This guarantees that the Store
remains alive for the duration of the requested operation. While not as ergonomic, this would more closely model how things are done in Wasmtime itself.
I haven't yet looked over the PR, but I am, on the surface, wary of injecting GC.KeepAlive
in lots of places. I'll have to dig deeper and see what's necessary.
I followed up with #200 to say that I think that storing a weak self-reference in the store context data, as you originally described, would be a good path to move forward on for fixing the issue with Caller
retrieved objects being tied to the lifetime of the call and having the least impact on the current API.
I'm going to move forward with this review to fix the store lifetime issues at interop call sites.
Keep the
Store
alive until itsStoreContext
is no longer used.As noted in https://github.com/bytecodealliance/wasmtime-dotnet/issues/200#issuecomment-1355191511, this is required (to handle the case when you forget to dispose the
Store
) as otherwise it could theoretically happen that theStore
handle is deleted (withwasmtime_store_delete
) in the GC finalizer thread even when at the same time a native call using theStoreContext
handle is still executing in another thread, in case theStore
object is no longer reachable.For native methods taking a handle parameter that is passed as
SafeHandle
, this is not required as theSafeHandle
is already kept alive during the call. An exception is if you useSafeHandle.DangerousGetHandle()
to retrieve the handle as pointer value and pass it; in that case you must also keep theSafeHandle
alive.This commit also fixes an instance of the above noted issue in
Engine.IncrementEpoch()
, wherewasmtime_engine_increment_epoch
was declared as taking anIntPtr
handle, andEngine.IncrementEpoch()
usedSafeHandle.DangerousGetHandle()
without keeping the SafeHandle alive (this appears to have been introduced with #118).Note that I also added the
GC.KeepAlive(store)
if the store is used in method calls after that, because otherwise it is not guaranteed that the following call will actually keep the store alive (the call could be inlined, and if then doesn't read theStore
, it would still be eligible for GC); additionally, this would cause a "code debt" if you edit the code in the future, as e.g. when you would change/remove the code after the call using theStoreContext
, you would have to remember to add theGC.KeepAlive(store)
call.Note: A general question that came to my mind, is whether
Wasmtime
actually has a thread-safe resource management, i.e. whether it supports that one thread may be freeing a resource, while at the same time another thread is calling a native method that uses another value that internally might contain a reference to the resource freed by the first thread.For example, there is the
Engine
object (wasm_engine_t
) that you need to create first, and then you can create aStore
(wasmtime_store_t*
) from theEngine
(aswasmtime_store_new()
takes awasm_engine_t*
parameter), from which I understand that internally, the returnedwasmtime_store_t
may contain a reference to thewasm_engine_t
. (So, if you callwasm_engine_delete
to delete the engine while a store still exists, this will not actually free thewasm_engine_t
but just decrements its reference count. If you later delete thewasmtime_store_t
, then it and the referenced engine will actually be freed.)However, if in .NET code you create an
Engine
, and then create aStore
from theEngine
and then throw theEngine
away (and you forget to use ausing
block or to callEngine.Dispose()
afterwards), and later you callStore.Dispose()
, it can happen thatwasm_engine_delete()
is called from the GC finalizer thread, while at the same time your main thread is callingwasmtime_store_delete()
. Is this supported byWasmtime
?(If not, them I'm wondering whether the apporach of releasing handles in finalizers can actually be supported at all by
wasmtime-dotnet
.)What do you think?
Thanks!