WebAssembly / shared-everything-threads

A draft proposal for spawning threads in WebAssembly
Other
38 stars 1 forks source link

Ideas and resources for fuzzing shared-everything threads #84

Open fitzgen opened 1 week ago

fitzgen commented 1 week ago

This issue is intended as a place for discussion about approaches we can all use to improve our assurances that we correctly implement this proposal. A place to share relevant papers, moonshot ideas, etc...

Unlike many other Wasm proposals,[^0] we can't effectively exercise this proposal's additions to the language by just simply generating pseudo-random Wasm modules that contain this proposal's new instruction and feeding those into the system under test.

[^0]: To some degree everything here applies equally to the original threads proposal, but shared-everything threads expands the scope much further.

Off the top of my head, we need to additionally do things like

fitzgen commented 1 week ago

FWIW, @abrown has a pull request implementing support for shared-everything threads in wasm-smith: https://github.com/bytecodealliance/wasm-tools/pull/1756

This is good for exercising validators, but won't give us much as far as engine-implementation correctness is concerned.

tlively commented 1 week ago

Ideas I've had include:

I'd be very interested in reading any available literature in this area.

fitzgen commented 1 week ago

Some things that seem vageuly related, but which I haven't actually read yet:

cfallin commented 1 week ago

The hardest part by far seems, to me right now at least, to be the generation of programs that terminate (as Nick mentions above "ideally don't deadlock").

I wonder if one could hit both goals -- actually use the primitives, and terminate -- with a grammar-based random testcase generator that builds out of primitives such that the testcase must terminate and must produce a deterministic result, by construction. Then test with the composition of two levels of randomness: (i) building test cases, (ii) running each test case many times, perhaps with artificial perturbation (token-passing in the style of deterministic execution proposals like DMP but with token order changed by some random seed). The oracle condition is: results match across perturbed executions.

The grammar could include primitives like: spawn into two subthreads, (recursively generate subprograms), join; lock/unlock a spinlock, carefully constructed so the spinlock is scoped to a level in the spawn hierarchy and locks/unlocks are well-paired; similar, with the atomic wakeup primitives, well-paired between subthreads; atomic-add on one single atomic, with the returned values summed into a checksum and not used otherwise. No other shared data; add other random computations as per normal fuzz testcase generation, perhaps with normal loads/stores to disjoint (per-subtask) addresses and to TLS. (Importantly, no memory accesses to memory "protected by a lock" -- not racy but would introduce nondeterminism as critical sections commute.)

This seems to possibly have an advantage of avoiding more complex oracles (building happens-before graphs and verifying dynamically or somesuch), with the usual downside of grammar-based fuzzing that one only gets the coverage one plans to generate (but maybe this gives enough coverage by exercising all the building blocks? empirical question for sure).

fitzgen commented 6 days ago

Thought: If the engine supports interruption and we are checking dynamic traces for validity, then we don’t need to generate terminating and non-deadlocking programs. We can just check as much of the trace as we got up to the interruption point.

fitzgen commented 6 days ago

Another source of possible inspiration would be Jensen: https://github.com/jepsen-io/jepsen

Jepsen is a Clojure library. A test is a Clojure program which uses the Jepsen library to set up a distributed system, run a bunch of operations against that system, and verify that the history of those operations makes sense.