WebAssembly / gc

Branch of the spec repo scoped to discussion of GC integration in WebAssembly
https://webassembly.github.io/gc/
Other
995 stars 71 forks source link

Multithreaded languages with JS references #188

Closed RossTate closed 1 year ago

RossTate commented 3 years ago

I'm wondering how, in the future, multithreaded languages are expected have references into JS or the DOM. I understand, and am not questioning, the requirement to not have racy access to JS/DOM, and in particular for the DOM to be accessed only within the event loop. I'm just wondering how languages are expected to compile within that requirement. After reading the write up on shared, the only solution I could think of was to have a non-shared table of externref (within the instance on the event-loop thread), and for compiled references (which all seem to have to be shared to be usable within the multithreaded language) to use the index of the externref within that array. But then the GC can't detect cycles, which I understand is the main purpose of this proposal. (Related issues have also come up in stack-switching, where we're wondering how to eventually make work-stealing possible.)

Thoughts?

RossTate commented 3 years ago

@conrad-watt, I'm curious what your thoughts are on this?

conrad-watt commented 3 years ago

Is there a particular aspect of this question that's specific to Wasm, rather than being a general issue with the DOM model? The issue of JS Web workers not being able to access the DOM directly has been discussed and worked around for a while. I believe purely JS approaches would have analogous issues with "cycle" detection.

There is a question of whether a source language/toolchain targetting (Web) Wasm would explicitly expose the UI thread as something which other threads must synchronously post jobs to, or whether it would give the programmer some other abstraction while implementing it that way under the hood. For example, if reflow/paints were relatively infrequent, the program could be compiled using shared references everywhere, periodically triggering the UI thread to snapshot the current state and update the DOM accordingly. I don't think there's a single right choice.

aardappel commented 3 years ago

the write up on shared

Could you link to this?

conrad-watt commented 3 years ago

@aardappel probably this

https://github.com/WebAssembly/gc/blob/master/proposals/gc/Post-MVP.md#threads-and-shared-references

RossTate commented 3 years ago

Yep, along with the paper referenced therein. Thanks, @conrad-watt! Since you're on that paper, I was wondering if you had worked through this scenario and had a plan for it.

conrad-watt commented 3 years ago

I've "worked through" it in the sense of "come to terms with" it :P

AFAIU Wasm isn't making the problem any worse - the underlying issue is that some applications handle shared objects/concurrent UI update in a way which is incompatible with the event loop/DOM, and therefore translating them for Web use (whether through compilation or otherwise) is just difficult.

aardappel commented 3 years ago

Interesting.. this appears to specify a complete type-bifurcation of shared and unshared, with critically, all members of a shared struct must also be shared. This makes a lot of sense, because this way we can stick JS objects safely in unshared GC objects.

Java however does not appear to have limitations on threads accessing objects that are not "monitors" and are still shared between threads (and will happily produce race conditions if so). So this would mean that a Java program making use of concurrency would in the general/worst case have to be translated to a Wasm module where ALL structs are marked as shared. That in turn would stop this program from having a cyclic object graph with JS objects directly.

You could try and force this type bifurcation on Java, but I am guessing if monitors hold general container types this is going to run into limitations real quick.

So yeah, beyond even the DOM, for most managed concurrent GC languages there will be no shared GC space between the program and JS, and no GC of cycles between the two.

I personally don't find this shocking, I've been arguing for a long time that we should have this "we need to be able to collect cycles between JS and Wasm" hold back the possible designs of Wasm GC. Maybe if we accept that concurrency is more important than cycles with JS, there are other things we can improve about Wasm GC, like multiple GC spaces, or better interop with linear memory.

ajklein commented 3 years ago

My expectation is that in the long term we're likely to see both Wasm GC which interoperates closely with JS, and Wasm GC that's mostly separate. We already see this today with the way applications do (or don't) share between the linear memory world and the JS heap.

Extrapolating only one of these possible futures, and using that extrapolation to argue against integration with the host GC (as I take @RossTate's OP to be doing), seem to me to be overly-limiting their view of possible futures, given the nascent state of Wasm GC.

RossTate commented 3 years ago

Heh, I can appreciate that, @conrad-watt. And @aardappel recreated a number of my thoughts as well (though I do still feel that Wasm would benefit from support for host-managed GC).

@ajklein The intent of my post is to raise awareness that more thought needs to be put into the problem, and earlier rather than later.


My sense is that the reachability approach of the shared heap will not address the needs of major languages for the reasons the three of us have outlined above. And, as pointed about above, if we don't figure out how to address those needs, then generated programs will essentially be forced to revert to using the current pattern of tables of JS references, which the GC proposal's primary purpose is to avoid. So I think we need to figure out a better solution to this problem.

When we were discussing the Requirements document (#121), we were discussing the need for supporting and reasoning about multiple heaps in order to address this problem. I think that idea was in the right direction, but after exploring the problem some more and working through some use cases, I think there's some more nuance that needs to be considered.

For example, consider a message-copying-based parallel language. Such a language would respect JS's isolation requirements, and so a good GC proposal would be able to guarantee its correctness. The key challenge, though, is that the language's message-passing mechanism itself needs to be implemented within the GC proposal (for wasm to be self contained). Message passing is often implemented through channels. For simplicity, let's consider just wait-free unbuffered unidirectional channels, and let's suppose that the language is typed so that SendChannel and ReceiveChannel are distinct. The channel simply has a mutable field containing the message; a send copies the message and sets the field (overriding any preexisting one) and a receive just gets the message. This should be the easiest case to consider.

The issue is that the channel reference is, itself, shared across the two threads. Furthermore, the receiver should also be able to mutate the received message without concern for data races because it was copied. So the message field of the channel itself is racy, but all fields of the message it points to are not racy---though they can only be accessed/mutated by the receiver. Note, further, that this means the sender has a reference-path to fields it is not allowed to access or mutate.

So I've already figured out a way to type-check this. The type system can even permit the message to contain JS objects and guarantee that they are copied appropriately and so will have no racy accesses. However, I can't figure out how to rectify that type system with the current MVP's. The problem is that the channel is in conceptually two heaps; it's value is a SendChannel in one heap and a ReceiveChannel in the other heap. With subtyping, that means whatever structural type it has belongs to both anyref sender-heap and anyref receiver-heap. That makes subtyping non-deterministic, which is problematic for equi-recursive types due to computational complexity. And the reason why anyref needs to have a heap argument is to make rtt.canon+ref.cast respect heap isolation (otherwise I could upcast a (the encoding of) SendChannel to anyref and then successfully downcast it to (the encoding of) ReceiveChannel).


So, yes, once again I am expressing a concern that the MVP will fail to extend to an important use case, albeit this time a very different one. I mentioned in this presentation that one advantage of nominal types is that they are more extensible that structural types regarding unforeseen low-level considerations because structure is only one dimension of a type and nominal types leave room for other dimensions. (I should also mention that the low-level encodings of SendChannel and ReceiveChannel involve an existential quantifier abstracting the heap/thread on the other side of the channel.)

Of course, @conrad-watt has a lot more experience than me on reasoning about data races, and I haven't run these thoughts by him, so I'm interested to learn about the ideas he has.

conrad-watt commented 3 years ago

I don't see the connection to the "needs of major languages". If you're writing an app for the Web, whether its in JavaScript, or in Kotlin compiled to Wasm, you should expect to have to obey the restrictions on how JS objects are used and how the DOM is manipulated that every other Web app has had to since the beginning of time. Conversely, you shouldn't expect to be able to map the object concurrency and UI manipulation of an arbitrary source language onto JS objects and the DOM. If one isn't trying to compile an app which manipulates JS objects/the DOM, then there's no issue, since regular Wasm references could be shared with no restriction.

I could imagine Wasm programs on the Web importing postMessage, or some wrapper, and maybe at some point we'll want a postMessage-like feature set for pure Wasm. The question of how one could express an aesthetically pleasing synchronous channel in Wasm, with ownership transfer represented in the type system, is an interesting thought exercise, but I don't agree that it will lead to any actionable insights for the GC proposal.

I'm not trying to be dismissive of the effort you put into the example above, I've now spent 2.5 hours solidly reading it over and attempting to work out a useful response. But I think it's getting too far into the weeds of a single hypothetical scenario, and the connection back to a criticism of equi-recursive types is quite tenuous. At a high level, my interpretation of your scenario is that your channel types are representing an ownership policy in addition to a structure, in which case I'd expect you to need the channel to be a special-cased object type whether or not the type system is generally nominal or structural (e.g. to correctly type creation of the channel).

RossTate commented 3 years ago

Thanks for the thoughtful post. I tried to cram a lot into mine both because it seemed I had accidentally set up @ajklein (and likely others) to misunderstand the goal of this thread to be to dismiss the cycle-detection goal of the GC proposal, and because I wanted to illustrate why I have some reason to believe this should be discussed now rather than later. I'm not expecting you or anyone to conclude that it's impossible for the current MVP.

So with that said, let me step back to establish the connection to the point you rightly raise: "the needs of major languages". Let's consider multithreaded Java compiling to wasm on the browser. One of two things is going to happen:

  1. Java will throw out its existing multithreading infrastructure and design a new multithreading infrastructure that can be retrofitted onto message-passing web workers.
  2. Java will keep its existing multithreading infrastructure and dynamically enforce using some DOM/JavaScript-proxy library that DOM/JavaScript objects are only accessed on the thread they were allocated on.

I think it's pretty safe to say that (1) is not actually going to happen. For one, my understanding is that browser implementations are not particularly opposed to data races—after all, SharedArrayBuffers already exist—they're just opposed to data races on DOM/JavaScript objects. So there won't be much pressure in the direction of (1). On the other hand, (2) lets multi-threaded programs and libraries (except for meaningfully-platform-specific components like the front end) use the same source code for web, Android, desktop, and server backends. So there is going to be a lot of pressure towards (2). The only challenge with (2) is dynamically enforcing the restriction on DOM/JavaScript objects.

If all WebAssembly has in place for multithreading is the plan outlined in the MVP, then the way that problem will be solved is for all Java objects to be shared and for the DOM/JavaScript proxies to have a module-instance-identifier i32 field and an i32 field that is an index into the externref table of that module instance. With that set up, dynamically enforcing the restriction on DOM/JavaScript objects is extremely cheap. However, in the process we've lost the ability to detect cycles across wasm programs and the host.

So with the current plan, it seems like once we add multithreading we'll be on track to violate what many have expressed (at least to me) as the main purpose of the GC proposal (over the alternative of making it easier to support linear-memory GC). I know how to fix this since the required extension is expressible in the framework I have for creating (inferable) typed assembly languages—nothing fancy like ownership types is necessary—but that framework is not compatible with the MVP. Of course, there might be another fix that is compatible with the MVP, but it sounds like you've already considered this problem to no avail, which makes me all the more concerned.


I'm going to take a second to point out that JS runtimes aren't the only ones that rely on restricting where data races can occur in order to ensure safety correctness. Many runtimes fall into that category. One example is Java, where the very memory safety of the efficient implementation of interface tables relies on the interface table being race free. Go is not memory safe due to data races (see article here). In both these cases the issue is multiword values. For example, racy fields of type inref (i.e. (fat) interior references in the Post-MVP) are not memory safe. These issues with raciness and safety are one reason I was concerned when it was revealed in https://github.com/WebAssembly/gc/issues/189#issuecomment-772561163 that there is no plan for verifying the "build and freeze" initialization pattern, as that pattern is critical to ensuring the race-freedom that makes many language-runtime internals memory safe.

So it'd be nice to have a plan for guaranteeing (localized?) race-freedom that covers the needs of not JS runtimes but language runtimes more broadly. Pondering that broader issue is how my channel example above come about.

conrad-watt commented 3 years ago

Fundamentally, if you express uses of JS objects/the DOM in your source language that aren't actually allowed on the Web, something weird is going to happen. What you're proposing is a relaxation to the restrictions on JS/the DOM to allow multithreaded access so long as the accesses can be verified to be race-free (and also thinking about Wasm mechanisms by which the race-free property can be established).

This is an interesting idea and worth pursuing, but I really don't think that it can be made into a criticism of the current line of GC work. This isn't a change that can be effected in Wasm unilaterally, but would require a lot of buy-in from various different corners of the ecosystem.

I know how to fix this since the required extension is expressible in the framework I have for creating (inferable) typed assembly languages—nothing fancy like ownership types is necessary

I'm surprised that you wouldn't consider the extension to be a form of ownership types, since IIUC that's exactly the dimension that you'd be adding to the type system. It might be that the specific extension sketched above for channels is exactly sufficient to express the ownership of a channel, and not more nuanced forms?

Of course, there might be another fix that is compatible with the MVP, but it sounds like you've already considered this problem to no avail, which makes me all the more concerned.

We have not considered adding any kind of ownership dimension in the type system for shared objects. That's not the same as having attempted to do it and failed, where an alternative MVP would have succeeded. I think there could be an interesting effort of research/evangelism here to convince people that it could be a viable approach, especially considering Rust, but you've phrased this issue in very urgent terms considering the effort and consensus-building (incl. outside the immediate Wasm community) that would be needed to go in this direction.

RossTate commented 3 years ago

Ah, I realized I mischaracterized the guarantee that JS/DOM engines need to maintain. Ensuring race-freedom is too weak. JS/DOM engines need JS/DOM values to only ever be accessed/mutated by the thread they were created in (for DOM, this is the event loop). I'm still ensuring that property.

The one thing that I'm relaxing is that I'm making it possible for threads to have references to JS/DOM values of other threads (even though the thread cannot access/mutate those JS/DOM values). Again, that relaxation is necessary if we want to be able to collect cycles. Without it, people will use tables to get around the restriction.

So, fundamentally, any design that ensures the restriction on JS/DOM values by limiting reachability will also make it impossible to detect cycles, as people will have to use tables to bypass the reachability restrictions. Fundamentally, the design needs to be able to distinguish references associated with different threads even if those references otherwise have the same structure.

Thus ownership types are not a suitable solution as they generally work by limiting reachability, e.g. ensuring that only the owner of an object has a reference to it.

but you've phrased this issue in very urgent terms

The OP phrased the problem clearly in non-urgent terms, and there was absolutely no discussion for two weeks. When it did finally start getting discussed, within hours I was accused of suggesting that we should drop host-integrated GC altogether, despite the fact that I'd done nothing more than prompt people for more thoughts and encourage more discussion. I don't blame @ajklein for doing that, since he's just interacting to the hostile environment of this proposal, as am I and as are you. So please understand that I expressed my concern in order to be taken seriously, and then I asked you for suggestions on how to quell that concern, which I would be really interested to hear.

ajklein commented 3 years ago

It seems I was incorrect in my inference based on @RossTate's initial message, apologies for the mistake. My comment was in the context of @aardappel's response, which did start to use @RossTate's line of argument as a potential reason to reconsider host-integrated GC (@aardappel please correct me if I'm wrong!).

I don't think that an incorrect inference (I certainly didn't mean to "accuse" anyone of anything), or a two-week delay from posting to discussion, warrant the sort of escalation I see in this thread.

RossTate commented 3 years ago

I understand, @ajklein, and I appreciate you following up. I also agree that the escalation was unwarranted. But escalations tend to be a byproduct of hostile environments, and I for one find this proposal to be quite hostile. The reason I escalated things is because I feel quite insecure in this group and because I am extremely frustrated with the champion's repeated dismissiveness of my concerns (which the groups ongoing toleration of then reinforces the sense of insecurity). Your comment triggered that sense of insecurity, and then @conrad-watt's comment felt rather dismissive of the concern, seeming to focus on why the problem does not need to be solved now rather than how we might solve the problem, which also triggered my sense of insecurity. I understand that in both cases that might have been an overreaction on my part, but that is how emotionally distressed I am in this proposal. (Seeing that emotional distress exacerbates problems, especially with friendly colleagues, only makes the distress worse.)

dtig commented 3 years ago

Leaving a quick reminder here for folks to treat each other with respect and empathy regardless of technical disagreements that are sometimes inevitable.

@RossTate, I'm sorry your experience has been one of hostility - I'd like to help mediate, will be following up offline.

RossTate commented 3 years ago

Thanks, @dtig. But I want to be clear that no person in this thread has been hostile. To the contrary, I'd highly recommend people work with you, @conrad-watt, @aardappel, and @ajklein, as I have had great experiences with y'all. A hostile environment does not mean hostile people. It takes just one person to create a hostile environment, and for the group to tolerate that person's hostile behaviors. This group acts like that person is me, and so I'm going to take the cue and step out. I'm tired of feeling miserable, and I hate how that misery is affecting my own behavior. I don't even get paid to be here; I've just been volunteering my time and expertise because I saw an opportunity for the world to benefit from a new generation of solid software foundations.

aardappel commented 3 years ago

@ajklein I did not mean to imply we should re-consider this proposal entirely, just that we should be aware of the consequences and agree that we're ok with them.

I think fundamentally @conrad-watt's assertion that a language may need to bend to the needs of a platform is the right one, which means restrictions on sharing if you want to work closely with JS. That is fine.

That said, the consequences for Java that @RossTate outlines are somewhat serious. Java has concurrency deeply integrated into the language (every object can be a monitor) and it has a rather sizable standard library where (correct me if I am wrong) it will be common for a Java program that isn't itself concurrent directly still be using concurrency primitives indirectly through the libraries it touches. Making a single threaded or message passing Java may be a significant undertaking that produces an incompatible standard library, and I agree with the prediction that it won't be a popular direction.

And of all the languages out there with concurrency features, shared objects seem more common than (by copy) message passing (saying that as someone who prefers the latter), so this kind of "pain" is not limited to Java, it is just a prominent example.

Go for example has a CSP style communication model, but afaik still has no restrictions of both threads involved accessing the same object, thus counts as a "shared" model in my book.

conrad-watt commented 3 years ago

@aardappel here are some schemes off the top of my head that I could imagine toolchains attempting. I'm sticking to DOM objects for now, but you could imagine something analogous for JS objects:

  1. Require that Java compiled to Wasm can successfully partition its ref types between shared and unshared in a way that all objects which can transitively reference a DOM object are unshared. It's likely difficult/impossible to infer this as outlined above, but it could be an explicit configuration/annotation in the toolchain. Failure to correctly partition would result in either compilation failure, or the production of ill-typed Wasm.

  2. Only offer access to a "shadow DOM" API at the Java-level, together with a flush function which, when executed in compiled code, signals the UI thread to copy the contents of the shadow DOM into the "true DOM". This would allow all Java objects to be shared. As I mentioned above, this might be appropriate if updates are expected to be infrequent.

  3. Use the index indirection @RossTate outlined. This could possibly be made explicit in the Java-level API, so one never directly manipulates a DOM object, but instead only "DOM handles", which are implemented in JS/Wasm as indices to DOM objects which are kept alive in an array by the UI thread. The question of how to cause an underlying DOM object to be GC'd is a more focussed version of the "cycle" concern that Ross raised - in general the UI thread must keep the DOM object alive unless it's known for sure that no handles point to it (cycles may make this harder). Does this mean an explicit free instruction on handles would be appropriate?

  4. Don't allow Java to hold DOM references at all in regular code, only to post "jobs" to the UI thread which include DOM manipulation operations.

EDIT: not saying these are perfect solutions and I agree that the mismatch between Java/JavaScript/DOM models is unfortunate

lukewagner commented 3 years ago

One thing that would improve this situation is if Web APIs could start being marked as shared (on functions/methods and interfaces in Web IDL), such that they could be imported as shared wasm host functions (or, for interface objects, passed via (ref shared extern)) and thus called directly by N wasm threads. Then, the use of the variety of schemes Conrad mentions would be only necessary when dealing with DOM and JS.

conrad-watt commented 3 years ago

@lukewagner I'm maybe misunderstanding, but is this not the case even if there are no shared versions of Web APIs? The modules running in each Web worker could import that worker's unshared version of the host function/function reference (I'm assuming everything flowing in from a JS/Web host would be unshared by default).

This could be an issue if the original source code attempted to mix these imports in with source language objects, though.

EDIT: ah, the point is that if compiled Wasm function references must be shared, then one couldn't call an unshared API from within a shared function (without one of the above tricks)?

bvibber commented 3 years ago

Something that I thought of while reading this thread: in a JavaScript host context the problem with using an i32-indexed table of externrefs is that this can't be followed by the GC, and dead objects will live on unless manually freed. If instead of an i32 a shareable Wasm GC object were used as the index, it could be used as a key on a JavaScript WeakMap on the main thread.

Assuming the GC works right in this scenario, keeping the index object alive will keep the associated JS object alive via the weak map, and freeing them all should allow it to be collected.

However I may have misanalyzed something, or be misunderstanding the situation so please feel free to correct me!

conrad-watt commented 3 years ago

@brion IIUC that would make a lot of sense, although we haven't thought through the details of how shared Wasm objects would be exposed to JS. It may be that engines wouldn't be able to cope with allowing such objects to be WeakMap keys. If it could be allowed, that would make solution (3) a lot neater.

bvibber commented 3 years ago

@conrad-watt yeah, using shared keys is my primary concern, that may be a scenario that isn't planned for by the GC authors yet. :) If they use a wrapper object on the local heap for instance that might throw a spanner in the works.

conrad-watt commented 3 years ago

There could also be an angle on the JS side. There's an ongoing JS proposal to allow Symbols as keys for WeakMaps. If this could be combined with some kind of "shared Symbol" that could be postMessage'd, then even if shared Wasm objects need to be wrapped locally when exposed to JS, there'd be a mechanism to get the arrangement we'd want.

lukewagner commented 3 years ago

@conrad-watt There's two reasons, actually. The first is what you said: ideally we don't need to create N wasm instances in N threads; rather, we can create 1 instance that is shared by N threads, and thus you want that 1 instance to be able to import shared host funcrefs. (That 1 instance could also have unshared imports, and those could only be called from unshared functions which, transitively, could only be called from the original JS thread that created the instance.)

The second thing is: ideally we want not just shared Web API functions, but also shared Web API objects (say, a GPUBuffer) by having a shared extended attribute on Web IDL interfaces. This would be necessary if you want to, say, allocate a GPUBuffer on thread 1, initialize it on thread 2, and use it on thread 3.

Horcrux7 commented 3 years ago

As an implementer of a Java compiler I see no problem with single thread access to JavaScript and DOM. This concept is not other as the concept Swing and many other GUI frameworks. As there is currently no standard library for DOM access this can be implemented with an event loop.

The more interesting question is how can I run a single instance of a WebAssembly with multiple threads? How create I shared GC objects? If I use WebWorker then I have multiple instances.

aardappel commented 3 years ago

@Horcrux7 can you sketch how you would see a Java program refer to the DOM (and back)? What would be the outcome of a cycle between the DOM and Java objects? Would it be similar to any of the 4 methods @conrad-watt outlined above?

conrad-watt commented 3 years ago

For reference, ~I believe Swing's invokeLater is analogous to method 4~, although there might well be other features of Swing that aren't exactly analogous (I'm nowhere near an expert).

EDIT: IIUC, Swing's GC story is somewhat easier, because it allows UI references to be held cross-thread and dynamically protects against race conditions through ConcurrentModificationException. Doing something analogous here would require (at least) some relaxation for how JS/DOM references can be handled.

conrad-watt commented 3 years ago

@Horcrux7 wrt. your second para, you could instantiate a module in one worker, and then use postMessage to pass shared tables to other workers containing the resulting instance's shared (function and object) references.

rossberg commented 3 years ago

@brion IIUC that would make a lot of sense, although we haven't thought through the details of how shared Wasm objects would be exposed to JS. It may be that engines wouldn't be able to cope with allowing such objects to be WeakMap keys.

Technically, this would be possible, but it might induce significant additional cost into Wasm heap objects in a JS embedding. The way objects as keys work in a JS engine like V8, for example, is by lazily adding a hidden object field to them that contains their hash value. The representational mechanisms for enabling this are already a sunk cost for those heavyweight JS objects and their semantics, but of course the hope is that Wasm heap objects would be more lightweight.

Horcrux7 commented 3 years ago

@aardappel can you sketch how you would see a Java program refer to the DOM (and back)? What would be the outcome of a cycle between the DOM and Java objects? Would it be similar to any of the 4 methods @conrad-watt outlined above?

I think that it is not similar to 4. Or I does not understand 4.

The idea is that every GC object can hold a reference to a DOM or JS script object. There is no difference between shared and not shared GC objects. But WASM fire a trap if you access to the DOM/JS object with another thread. The implementer of the DOM access library has to check that this trap never occur.

This required:

@conrad-watt wrt. your second para, you could instantiate a module in one worker, and then use postMessage to pass shared tables to other workers containing the resulting instance's shared (function and object) references.

You means that all globals should be replaced with a value in a shared table? The problem is that the first instance must allocate this shared table. Also the start() function should also run only in the first instance. This required also to extends the postMessage. Currently it is not possible to post objects. Only SharedArrayBuffer is supported. This will open many also for JavaScript. Also is the handling very complicated. I expect also that the resource consume will be larger if every worker create an instance instead to share the instance. Creating a thraed that share the instance seams me many simpler.

conrad-watt commented 3 years ago

The idea is that every GC object can hold a reference to a DOM or JS script object. There is no difference between shared and not shared GC objects. But WASM fire a trap if you access to the DOM/JS object with another thread. The implementer of the DOM access library has to check that this trap never occur.

The compiled code still has to be able to partition the Wasm objects into shared and unshared, so whatever mechanism the DOM library uses to guarantee the absence of such a trap still needs to translate into Wasm types. If you were to naively use this strategy, I'd expect the compiled Wasm code to give a type error.

This connects to @RossTate's point that even if the source language "knows" that no out-of-thread accesses happen, there's still the question of how to successfully translate it into the current Web restrictions on the DOM/JS.

EDIT: to be clear, I don't think that obeying these restrictions is impossible, but it's less neat than the Swing arrangement. I also don't think it would be impossible to relax these restrictions, but it would probably require conversations with other standards bodies.

EDIT2: just to be completely explicit, the current Web restriction is that JS/DOM references cannot be held cross-thread under any circumstances. My interpretation of the post above is that it's assuming that such references can be held cross-thread, but accessing them outside their origin thread is a dynamic error.

You means that all globals should be replaced with a value in a shared table?

You could think of it as though the whole instance is shared: every export of the module will be shared with other threads, including the code inside the instantiated functions. So only an initial instantiation is necessary, then every other thread gets access to the resulting functions.

This required also to extends the postMessage.

If we pursue the current idea of a shared attribute, postMessage will definitely be extended in this way.

Horcrux7 commented 3 years ago

The compiled code still has to be able to partition the Wasm objects into shared and unshared, so whatever mechanism the DOM library uses to guarantee the absence of such a trap still needs to translate into Wasm types. If you were to naively use this strategy, I'd expect the compiled Wasm code to give a type error.

Why a type error? It can only be a runtime error. The Wasm compiler can not decide which thread will access an object.

This connects to @RossTate's point that even if the source language "knows" that no out-of-thread accesses happen, there's still the question of how to successfully translate it into the current Web restrictions on the DOM/JS.

My current plan it to have wrapper around every DOM object like this: https://github.com/i-net-software/JWebAssembly-API/blob/master/src/de/inetsoftware/jwebassembly/web/dom/HTMLCanvasElement.java The peer object in the wrapper will be the only unshared object. It will of type externalref. Of course I only see it through the point of view of Java where all is shared.

My interpretation of the post above is that it's assuming that such references can be held cross-thread, but accessing them outside their origin thread is a dynamic error.

This is a requirement of a multi threading language. Also if you hold only an index position to a list of the references. De facto this are cross-thread references. This method 3.) required a finalize mechanism to freeing the references. A concept that GC languages eh needed.

conrad-watt commented 3 years ago

Why a type error? It can only be a runtime error. The Wasm compiler can not decide which thread will access an object.

The peer object in the wrapper will be the only unshared object.

IIUC, the wrapper itself will be shared? If so, this would be a type error since a shared object can't contain unshared elements (unless using an index indirection).

Also if you hold only an index position to a list of the references. De facto this are cross-thread references.

From the point of view of the compiled code, I wouldn't consider the index position to be a reference (e.g. no GC, can't directly invoke functions on it). That's the reason it's allowed when "true" cross-thread references are not.

aardappel commented 3 years ago

The idea is that every GC object can hold a reference to a DOM or JS script object. There is no difference between shared and not shared GC objects. But WASM fire a trap if you access to the DOM/JS object with another thread. The implementer of the DOM access library has to check that this trap never occur.

That may be a way to make it work, but that can't work with the current design. The DOM ref cannot sit in a shared GC object. If it sits in an unshared one then apparently all of your objects need to be unshared and you wouldn't have multi-threading.

But this brings up a good point that while statically bisecting objects into shared and unshared is highly desirable, it may also be unrealistic in some cases. So maybe there needs to be a 3rd type (thread_ref ?) that carries its thread id along with the ref, and traps upon deref if in the wrong thread. It can owned by a shared parent, and also by an unshared parent (useful for storing opaque pointers even if they belong to other threads), or can be de-referenced before written to a regular ref (and thus lose its dynamic thread id).

conrad-watt commented 3 years ago

But this brings up a good point that while statically bisecting objects into shared and unshared is highly desirable, it may also be unrealistic in some cases. So maybe there needs to be a 3rd type (thread_ref ?) that carries its thread id along with the ref, and traps upon deref if in the wrong thread

This would be a genuine change to the Web model, but maybe we could get people to agree that it would be a harmless one. We'd have to work through the details of how such an object would escape into JS, etc, and whether there would be any engine GC implications (e.g. because formerly single-thread objects could now be kept alive by cross-thread references).

Horcrux7 commented 3 years ago

If unshared objects are not assignable to a field, a locale or a global that is shared then on every ref setting this must be checked. Because in a multi threading language 99,9% of all objects are shared this sounds like a large overhead.

Or should shared and unshared ref types incompatible. Is there no common super type like anyref? Then already the compiler can check this.

bvibber commented 3 years ago

But this brings up a good point that while statically bisecting objects into shared and unshared is highly desirable, it may also be unrealistic in some cases. So maybe there needs to be a 3rd type (thread_ref ?) that carries its thread id along with the ref, and traps upon deref if in the wrong thread

This would be a genuine change to the Web model, but maybe we could get people to agree that it would be a harmless one. We'd have to work through the details of how such an object would escape into JS, etc, and whether there would be any engine GC implications (e.g. because formerly single-thread objects could now be kept alive by cross-thread references).

I like this model, I think it neatly solves the basic problem of passing around a foreign heap pointer in a way that can be sensibly unwrapped.

Important things to note:

bvibber commented 3 years ago

Note that the GC being able to actually follow those references is the key thing, and might be particularly complex if it's necessary to cross heap space boundaries that make pointers incompatible.

aardappel commented 3 years ago

do thread ids and heaps map 1:1, or are these two separate dimensions?

That was my idea, yes. It's not even a thread id, maybe heap id would be more accurate: "this pointer belongs to heap H and may only be dereferenced by code / a thread that has access to heap H, or trap otherwise". But yeah, that should generally be the same thing.

tlively commented 1 year ago

Closing this as non-actionable for the MVP, though the shared types discussed here and described in the post-MVP doc remain the plan of record for allowing multithreaded access post-MVP.