WebAssembly / shared-everything-threads

A draft proposal for spawning threads in WebAssembly
Other
29 stars 1 forks source link

Dynamic sharedness checks as an escape hatch #37

Closed tlively closed 1 month ago

tlively commented 5 months ago

Separating this issue out of https://github.com/WebAssembly/shared-everything-threads/discussions/30, since that veered into somewhat unrelated discussion of the expected GC architecture.

The current proposal doesn't offer a great solution for producers that want to have shared GC objects that contain references to JS objects such as DOM nodes. As @conrad-watt wrote in the previous discussion:

There are possible compromises - instead of the compiled Wasm code holding the DOM node as a true reference type, it could more easily hold a scalar handle that points to a slot in a table of DOM nodes managed in JS via thread-local functions called from shared Wasm.

This solution is very similar to how producers for linear memory languages manage references to JS, but WasmGC producers are used to being able to store externrefs directly in their objects, and that's a much simpler way to do things. It would be nice to offer an escape hatch in the type system for producers that want the convenience of storing externrefs directly in shared objects and that are ok with a runtime check to ensure that the externrefs are only ever accessed on the correct thread. Since externrefs are opaque in Wasm, the dynamic check that such "thread-bound externrefs" are on their originating thread could happen on the boundary where they are passed out to JS.

This would require host GCs to support shared data (the thread-bound shared externrefs) rooting unshared data (the underlying JS objects), but we potentially already have that requirement for JS hosts due to the thread-local wrapper functions in the JS API.

What do folks think about providing some form of dynamic sharing checks as an optional type system escape hatch for externrefs? We could provide dynamic checks as escape hatches for other types as well, but I don't think there's a need for that yet.

conrad-watt commented 5 months ago

I'm not sure that core Wasm is the right place to add support for this. If we could construct such an object on the JS side, it could be imported as a regular shared externref.

This would require host GCs to support shared data (the thread-bound shared externrefs) rooting unshared data (the underlying JS objects), but we potentially already have that requirement for JS hosts due to the thread-local wrapper functions in the JS API.

This looks to me like another ephemeron-like thing (which is good, because that's less controversial than arbitrary shared->unshared). The act of creating a new "thread-bound externref" creates an ephemeron in that thread where the key is the new "shared externref" and the value is the underlying object.

In fact, I think you could get the same functionality by having a JS weakMap in each thread, keyed by shared externrefs. The "create thread-bound externref" operation is implemented by minting a new shared externref and creating a new entry in the current thread's weakMap. The "attempt to deref thread-bound externref" becomes a lookup in the current thread's weakMap.

And in fact this is pretty similar to my sketch with scalars, although the use of shared externref + weakMap means that leaks can be avoided (which IIUC was the main criticism of my sketch).

syg commented 5 months ago

In fact, I think you could get the same functionality by having a JS weakMap in each thread, keyed by shared externrefs. The "create thread-bound externref" operation is implemented by minting a new shared externref and creating a new entry in the current thread's weakMap. The "attempt to deref thread-bound externref" becomes a lookup in the current thread's weakMap.

I like this implementation sketch -- nothing additional needed.

That said I think in the long term, on the JS side, it'd be worth it to have direct, unboxed, thread-bound references inside JS shared structs for both ergonomics and performance.

tlively commented 5 months ago

I'm not sure that core Wasm is the right place to add support for this. If we could construct such an object on the JS side, it could be imported as a regular shared externref.

Oh, interesting! Yes, it would be nice and clean to just create a check-ownership-on-read wrapper object on the JS side and import it as a shared externref.

In fact, I think you could get the same functionality by having a JS weakMap in each thread, keyed by shared externrefs. The "create thread-bound externref" operation is implemented by minting a new shared externref and creating a new entry in the current thread's weakMap.

What do you envision for this minting? Just calling into Wasm to allocate and return an empty shared struct? That would definitely work, but it would nice to have a more direct solution if we can get away with it.

That said I think in the long term, on the JS side, it'd be worth it to have direct, unboxed, thread-bound references inside JS shared structs for both ergonomics and performance.

@syg, can you sketch out how you envision this looking? Would there be any wrapper objects involved to do the thread-boundedness check? Is this something you think would be possible to get into the MVP?

syg commented 5 months ago

@syg, can you sketch out how you envision this looking? Would there be any wrapper objects involved to do the thread-boundedness check? Is this something you think would be possible to get into the MVP?

I'm envisioning a way to directly say that a shared struct's field is "thread-bound" or "thread-local". In JS, reads of the field would be the deref, and would do the thread access check. There would be no wrapper objects involved, and the object's shape would know that a thread-bound field has special semantics, much like an object's shape would know that a getter property behaves differently than a data property.

Having inline thread-bound fields has ergonomic advantages on the JS side. If you only had a wrapper object, you'd likely end up having setter/getter pairs for convenience anyways, since using them manually would be a PITA:

myStruct.threadBoundField = new ThreadBoundBox();
myStruct.threadBoundField.set(someDOMNode);
myStruct.threadBoundField.get(); // Does access check.

Having inline thread-bound fields also has performance advantages, since you're not allocating boxes. An object's shape or type can encode the info that a particular field has special semantics. As I said above on the JS side this is not really an issue: there are already accessor properties vs data properties.

Does that make sense?

tlively commented 5 months ago

Yes, that makes sense and I can see that it's much more ergonomic for JS. For the case where we want to have thread-bound shared externrefs in Wasm, it sounds like we would be able to create a JS shared struct with a single thread-bound field holding the unshared object of interest, then pass that wrapper struct into Wasm as a shared externref.

tlively commented 4 months ago

@syg, what do you think about including something like the ThreadBoundBox in the proposed JS API for this proposal, even if it could eventually be expressible via thread-bound fields in JS without any special API support? The goal would be to be less dependent on new JS syntax for thread-bound fields (not to mention shared JS structs in the first place) making it through standardization.

eqrion commented 4 months ago

Does a ThreadBoundBox keep the worker/thread that the wrapped value is from alive? Or is the thread allowed to terminate, at which point the wrapped value is destroyed?

Because the wrapped value is only accessible from the originating thread, it seems like we should be able to allow that thread to terminate. But as that thread terminates, we'll probably need to revoke the boxes such that they don't point at freed memory (either the wrapped value or the thread key that guards it). That's probably do-able if we have an indirection via this box, but if the thread bound value is supposed to be inline in shared gc objects, this becomes much harder.

tlively commented 4 months ago

I would expect the thread to be allowed to terminate, yes. I would also expect these wrappers to involve an indirection, so hopefully that would be manageable.