Requirements for phase 3

tlively commented 4 years ago

This repo has not had a lot of activity lately, so I'm wondering if this proposal is ready to move to phase 3.

The phase 3 entry requirements are:

Test suite has been updated to cover the feature in its forked repo.
The test suite should run against some implementation, though it need not be the reference interpreter.

We certainly have a working implementation, and it looks like a test suite has been added. Is there anything else we previously decided to wait on before moving to phase 3?

lars-t-hansen commented 4 years ago

I thought the process required the spec text but it does not, so we don't have to wait on the memory model for phase 3, for one thing.

(FF, like Chrome, has shipped this feature.)

binji commented 4 years ago

Good point, I had assumed we were waiting on some additional spec work to advance, but this looks like it can move to phase 3. The testsuite is on concern, however -- we have surface-level tests of the new instructions, but we'll really want something more comprehensive. Maybe it's sufficient for phase 3 if we can move some of Chrome's and FF's JS tests into this repo.

lars-t-hansen commented 4 years ago

We will run into the same problem that test262 struggles with - no good mechanism for "workers" across the JS shells. I promised this group (about three years ago it looks like, see #52 and also https://bugzilla.mozilla.org/show_bug.cgi?id=1414823) to design such a worker API for the shell but that work stalled, to say the least; and then we'd want something for the reference interpreter too.

Maybe the spec tests offer a way out, with their high-level test predicates? We could imagine a test comprising some number of expressions that are run concurrently and have to coordinate a test result and a signal that they are done; when translating for JS embeddings the translator could hide complexity that test262 exposes.

Constellation commented 4 years ago

Note that JSC implemented this behind the flag. https://trac.webkit.org/changeset/270208/webkit

dtig commented 3 years ago

Looks like this discussion has stalled, was this discussed in a meeting with additional follow up? If not, I'm wondering if in addition to the existing spec tests, would WPT be a good place to start to make sure that everything works as it should with workers?

My concern with JS Shell tests is that while these will work, they are not reflective of real world cases i.e. at least in d8, the worker API is somewhat barebones and depending on test cases it may be possible to get different results when running in the shell vs. the browser, and while we can fix the API, that seems to me to be the wrong thing to fix in this case. I'm probably missing quite a bit of context w.r.t. testing, but I'm wondering what we can do in the near term to hit phase 3 requirements. Concretely I'd like to propose that we decouple comprehensive testing and figuring out which high level test predicate should be used - i.e. use WPT in the near term for comprehensive testing for phase 3, and work on the larger problem of implementing/using the API linked above, or a different strategy for higher level testing separately.

tlively commented 3 years ago

Looks like this discussion has stalled, was this discussed in a meeting with additional follow up?

It was not.

rossberg commented 3 years ago

The test suite requirement should be read as tests integrated into the wast-based spec test suite. A JS-specific test suite is of little use to non-JS engines.

As @lars-t-hansen suggested, we should be able to design a suitable extension to the wast format. Would some form of "spawn" command, perhaps paired with a wait command, be sufficient?

lars-t-hansen commented 3 years ago

Some test cases for inspiration here:

in addition to whatever's in WPT and test262 of course. Apart from book-keeping it looks like spawning a function in a different thread and a wait or sleep command to give the spawned thread some breathing room to start up and do its thing may be sufficient for something that can claim to at least test all functions in the API. (The above tests are known to be brittle and have intermittent failures on (very) overloaded test systems; we can live with that, but I don't know if the CG can.)

(If we add something to the wast format, though, the JS generator will need to be able to target the different shells with their idiosyncratic thread mechanisms, or I'll be back on the hook for designing a common system.)

rossberg commented 3 years ago

Proper wasm threads would be nice to have. :)

I see that the first of your links already abstracts out spawn and send functions. Do you think it would be feasible for the generator to just produce calls to a small set of such functions and let custom runners define them appropriately?

lars-t-hansen commented 3 years ago

I see that the first of your links already abstracts out spawn and send functions. Do you think it would be feasible for the generator to just produce calls to a small set of such functions and let custom runners define them appropriately?

That's what the agent abstraction in test262 does, and my understanding is that it more or less works OK for them. ISTR the agent abstraction is fairly complicated because it works for both shells and browsers. Could we get away with something for shells only?

chicoxyzzy commented 3 years ago

According to https://webassembly.org/roadmap/, all major browsers have threads feature implemented already

conrad-watt commented 3 years ago

@chicoxyzzy the issue for this phase of standardisation is still the test suite. Thanks for the ping though, we probably want to un-stall this.

@rossberg, you mention above that the test suite requirement should be satisfied by .wast-style tests. So I believe as a first stage, we need to bikeshed a syntax for this.

EDIT: I've been iterating on the design of the instantiate_and_register_on_threads operation below, so apologies if the edits have been disorienting.

Currently, the .wast test suite leaves instantiation pretty implicit. I think to write multithreaded tests, we need the following bare-minimum functionality:

explicitly instantiate a module in a worker thread
explicitly transfer shareable exports to a worker thread
invoke functions across multiple workers simultaneously

Other logic such as number of iterations, blocking/coordination between threads, etc can be expressed in pure Wasm inside the called functions (e.g. with wait/notify).

EDIT: to be explicit, I don't think we need individual core tests that heavily exercise the worker/thread creation mechanism, such as forking a dynamic number of threads during runtime, since thread creation is currently outside our scope.

So I propose three new test primitives (below). At a high level, worker threads will be indexed by an immediate n. We can either explicitly require max n to be declared upfront, or assume more will be spawned as necessary. Each worker will be treated as having its own namespace.

instantiate_and_register_on_threads (n "instancename")* module

This primitive takes a module definition module (analogous to the module parameter of assert_invalid), instantiates it in worker threads n*, and registers the resulting instance in each associated worker's namespace as "instancename".

transfer_and_register_on_threads n* "instancename"

Transfers the shareable parts of main-thread-registered instance "instancename" to the worker threads n*, registering it in each thread's namespace with the same name. Currently this would only transfer shared memory exports, if any exist.

assert_on_threads (n (thread-assert))*

where thread-assert ::=
  assert_return_thread (thread-op) v_res*
| assert_return_one_of_thread (thread-op) (v_res*)*
| assert_trap_thread (thread-op)
| assert_timeout_thread (thread-op)
| ... others?

where thread-op ::=
  invoke_thread "instancename" "funcname" v_args*
| ... others?

Simultaneously execute the referenced functions, passing iff all returned results match. It might make sense to allow some configurable timeout after which the assert_on_threads fails - currently I'm following the single-thread assert_return which has no explicit timeout if the invoked function doesn't return.

Example .wast test:

;; MP_atomic.wast

(module
  (memory (export "memory") shared 1 1)
)

;; make the shared memory available for subsequent instantiations in the worker threads
(register "testmemory")
(transfer_and_register_on_threads 0 1 "testmemory")

(instantiate_and_register_on_threads
  (0 "t0")
  (1 "t1")
  (module $test
    (memory (import "testmemory" "memory") shared 1 1)

    (func (export "MP0") (param i32 i32 i32 i32)
      (i32.atomic.store (local.get 0) (local.get 2))
      (i32.atomic.store (local.get 1) (local.get 3))
    )

    (func (export "MP1") (param i32 i32) (result i32 i32)
      (i32.atomic.load (local.get 1))
      (i32.atomic.load (local.get 0))
    )
  )
)

(assert_on_threads
  (0
    (assert_return_thread
      (invoke_thread "t0" "MP0" (i32.const 0) (i32.const 8) (i32.const 42) (i32.const 1))
    )
   )
  (1
    (assert_return_one_of_thread
      (invoke_thread "t1" "MP1" (i32.const 0) (i32.const 8))
      ((i32.const 0) (i32.const 0))
      ((i32.const 1) (i32.const 42))
    )
   )
)

rossberg commented 3 years ago

@conrad-watt, that scope makes sense. However, I would suggest a somewhat more orthogonal extension to .wast scripts.

For example, I could imagine a small set of new commands like

cmd ::= ... | (thread <name>? <cmd>*) | (wait <name>)
assertion ::= ... | (assert_timeout <name>)

The commands inside a thread can access every module or instance from the parent thread, but it will be an error if they try to access an instance that is not transferable -- where an instance is deemed "transferable" if it only has transferable exports, i.e., using instances as proxies for the actual things.

Wouldn't that be enough? I believe everything else can be expressed in this framework (perhaps with a couple more additions). In particular, threads can contain assertions, and by waiting on a thread we can ensure they got executed.

Edit: one more required addition would be something akin to assert_return_one_of like you suggest, but that would be more nicely expressed by adding a (one_of pat*) result pattern.

conrad-watt commented 3 years ago

I like the idea of waiting on a whole thread instead of on an individual result. My only concern with that design would be the precise semantics of a child thread accessing its parent's namespace, especially mapping this onto operations on Web workers so that engines can run the tests.

Could we make it syntactically scoped, so that a child thread gets a deep copy of (only) the (references to) shared parts of its parent's namespace at the moment it's created? Then it could be modelled as the parent postMessaging all the current in-scope shared things created by instantiation to the child at the moment of its creation.

rossberg commented 3 years ago

Sounds plausible. In fact, I think it's equivalent: if you copy only the sharable parts, then an attempt to use something else will likewise result in an error (assuming we only make the other names inaccessible, not have them vanish from scope, which would be bad because it could unshadow something earlier).

conrad-watt commented 3 years ago

(assuming we only make the other names inaccessible, not have them vanish from scope, which would be bad because it could unshadow something earlier).

The way I'm thinking of things, the child thread never looks in a parent's namespace, it just starts with a copy and works from that. So the unshared elements do vanish from scope, but they don't unshadow anything. In fact, if a child thread were to shadow/replace a shared thing copied from the parent with an unshared thing, a child-of-child could never access the parent's shared thing.

EDIT: keeping the entries but making them give some "inaccessible" error would be perfectly fine too - but probably a little more bookkeeping to replicate gracefully with Web workers.

EDIT2: the interesting case is where the parent thread starts with an unshared thing in scope, creates a child thread, and then shadows that unshared thing with a shared thing, or creates a new shared thing. Whatever else, we don't want a semantics where that shared thing becomes visible to the child.

rossberg commented 3 years ago

In fact, if a child thread were to shadow/replace a shared thing copied from the parent with an unshared thing, a child-of-child could never access the parent's shared thing.

Ah, okay. But with that clarification I think that's also equivalent to what I described. Unless I'm missing something, the only difference would be the error message for a bogus script, like, "instance not shared" vs "inaccessible instance" vs "unbound instance".

conrad-watt commented 3 years ago

@rossberg I may have been going too far down the rabbit hole of worrying about a semantics where the parent scope can be updated after the child thread is created in a way that's visible to the child - on reflection it's not clear that this is what you were ever describing :P

So I think your proposed design is the right one - especially with the one_of pattern addition. So I guess a good next step would be to implement the parsing of these in the reference interpreter, stubbing out their actual semantic behaviours for now? I can also write a few representative tests.

rossberg commented 3 years ago

Actually, on reflection, I was only considering lexically scoped names. But there also is the "registry" used for resolving imports, which is more like a side table. For that I would indeed suggest creating a separate one per thread -- but simply starting empty, not even copying the parent one.

conrad-watt commented 3 years ago

For that I would indeed suggest creating a separate one per thread -- but simply starting empty, not even copying the parent one.

So the mechanism to instantiate two modules in different threads sharing the same memory would be that each thread must first perform register "instancename" $modulename, where $modulename is defined by a mutual parent and exports the shared memory? Makes sense!

rossberg commented 3 years ago

Yes, that's the idea.

mattgenious commented 2 years ago

Hi, just trying to figure out if this has stalled. I apologize if my comment turns out to be noise because of context or knowledge I am missing.

What is currently missing for this proposal to move to phase 3?

I am having a really hard time trying to figure out what is missing and which meeting notes mentioning threads actually have anything to do with this proposal.

tlively commented 2 years ago

@mattgenious, the only thing missing for phase 3 is the test suite. The reason that is tricky is that the current test suite has no concept of threads, so there is some design work necessary to figure out how to extend it properly. That's what the conversation above is about.

Unfortunately (or fortunately?) there hasn't been much motivation to push on this because threads have already been shipped in all the major browsers.

conrad-watt commented 2 years ago

FWIW, now that https://github.com/WebAssembly/threads/pull/179 is merged, the only thing missing that would allow us to write a test suite is the implementation of wait and notify in the reference interpreter.

mattgenious commented 2 years ago

@tlively Thanks for the info, makes a little more sense to me now. The reason I am interested is because as far as I know the .NET runtime Mono is waiting to implement threads until a spec has been "officially" released and I have a professional interest in that happening because the startup of the mono runtime is blocking the UI thread essentially causing ~2 seconds of unnecessary delay during load.

So I am tracing the path to find where work is missing :-)

@conrad-watt is it correctly understood that the tests need to be implemented in https://github.com/WebAssembly/spec repo?

And that the semantics are already present in the wasm-module-builder.js as kExprAtomicNotify, kExprI32AtomicWait and kExprI64AtomicWait?

I feel like I am making a lot of noise here, is there a better place where I can ask questions without causing trouble?

I joined the W3 community group but haven't seen any activity there.

tlively commented 2 years ago

I'll let @conrad-watt fill in the technical details, but you're definitely asking these questions in the right place, @mattgenious.

conrad-watt commented 2 years ago

@mattgenious I think the "least effort" path to be phase 3-ready would be as follows:

update the forked interpreter in this repository to implement the currently stubbed cases of wait (see here) and notify (see here).
write a suite of .wast tests using the syntax already implemented in https://github.com/WebAssembly/threads/pull/179

@rossberg, did you have a vision in your head for how these stubbed cases should be implemented? I'm willing to try hacking around on them.

rossberg commented 2 years ago

@conrad-watt, none I would remember. I believe these stubs were left by binji. Feel free to give it a shot.

mattgenious commented 2 years ago

Thanks @conrad-watt, @rossberg and @tlively for kindly including me in the discussion despite my obvious lack of knowledge on the subject.

Alright, I haven't ever written a line of OCaml before and I am also pretty sure I do not have sufficient understanding of the subject matter to venture into that domain. While I had a course in uni about writing interpreters in F# I think I have to start from scratch to make a dent in anything related to this.

I assume someone will have written the missing code before I learn OCaml and interpreter design, but if not I am willing to put in some effort.

ntindle commented 2 years ago

Based on the changes in #192 is this ready to move to the next stage?

conrad-watt commented 2 years ago

We're currently planning to vote on phase 3 at the in-person meeting later this month! Before the vote, I'm also planning to add a suite of concurrent tests based on some well-known "litmus" tests for concurrent behaviour.

conrad-watt commented 2 years ago

Threads are now phase 3!

WebAssembly / threads

Requirements for phase 3 #163