Open programmerjake opened 5 years ago
I believe, based on the discussion in https://github.com/WebAssembly/threads/issues/7, that it's possible to implement i64.atomic.wait
as a 64-bit atomic read (to confirm the expected value), conditionally followed by a linux 32-bit futex
operation on the same address, expecting the low 32-bits of the value read (modulo endianness).
that doesn't actually work, there can be a lost wakeup:
assuming little endian:
thread 1 | thread 2 |
---|---|
write 0x100000000 | |
start i64.atomic.wait 0x100000000 | |
read & compare | |
write 0x0 | |
futex wake | |
futex wait |
Ah, true. Sorry for the rookie mistake.
I don't think you can use raw futexes even for the 32-bit wait. The wait/notify mechanism has some guarantees, such as ordered wakeup, that raw futexes do not seem to provide.
Right, I don't think it was ever the intention that you could directly replace these instructions with raw futex syscalls.
Another modest plea to drop memory.atomic.wait64
(f.k.a. i64.atomic.wait
) for the sake of allowing a straightforward futex()
-based implementation. An additional modest plea to ease the requirements of memory.atomic.wait32
to allow a futex()
-based implementation, in particular allowing spurious wakeups. Some rationale:
futex()
-based non-web embeddings
futex()
-based non-web embeddingsAdditionally, allowing spurious wakeups eases low-level implementations of memory.atomic.wait32
in general, since the sleep condition is only on the initial check of the atomic. Disallowing spurious wakeups makes more sense when the semantics of the wait call are continuously dependent on the value of the atomic, e.g. in C++20 atomic_wait()
. Low-level implementations that have no way to prevent spurious wakeups are unable to decide if they should continue to sleep or not after one occurs (even if they weren't notified).
I believe the ordering issue can be resolved by placing manual seqcst fences around the futex calls, though syscalls already provide a memory barrier so that would be redundant. @lars-t-hansen could you elaborate what you mean by "ordered wakeup"?
I'm also curious about why we decided to disallow spurious wakeups. https://github.com/WebAssembly/design/pull/1019#discussion_r112070603 suggests it is because JS doesn't allow them, but I don't know why that is.
Interop with JS Atomics doesn't require 64-bit wait
JS supports 64-bit TypedArray in wait and notify: https://tc39.es/ecma262/#sec-atomics.wait handles BigInt64Array, and notify is size-agnostic but allows that type of view.
It was never (except maybe the very early days, when we were starting with something similar to what I believe NaCl had) intended for futexes to be able to directly express the semantics of wait and notify.
The prohibition on spurious wakeups and other design artifacts are there to push complexity into the implementation and away from the user (after all this is/was JS). Sticking to seq_cst for the first version is similarly motivated. Simply adding shared memory and atomics to JS was highly controversial, and compromises that reduced nondeterminism at the cost of some performance were necessary.
The intent of the ordered wakeup rule is fairness and (again) managing nondeterminism: if an observer can determine that one thread blocked in a wait before another, then waking one thread on that location will for sure wake the first one.
I should add: Compatibility with JS is a big deal for wasm on the web and since JS is still much more important than wasm, JS controls the agenda to some extent. But that fact does not prevent there from being additional instructions that work better in other environments. IMO we should be very open to proposals that add instructions that allow for better performance at the expense of nondeterminism across systems and implementations. But I do think those should be framed as new proposals.
The situation is similar to SIMD: At the moment, wasm SIMD has no nondeterminism and SIMD instructions work the same way (with some perf compromises) on different hardware platforms. But it is expected that the next version of wasm SIMD will have instructions that work (and are known to work) well on some hardware and less well on other hardware, or instructions that have variable amount of imprecision and therefore add nondeterminism, all in the interest of performance.
@lars-t-hansen your prompt response is appreciated and well understood. While it's unfortunate that the semantics of JS atomics wait/notify diverges from kernel-level wait/notify primitives for the sake of eliminating non-determinism, I understand the design constraint of maintaining semantic compatibility with JS.
@tlively Here's one example (of which you may or may not be aware) where disallowing spurious wakeups in the primitive comes in handy:
// thread 1
val = atomic_load_seqcst(&var);
if (!cond(val))
atomic_wait(&var, val);
assert(cond(val));
// thread 2
set_condition_seqcst(&var);
atomic_notify(&var);
This avoids a recheck of val
after the atomic_wait()
in thread 1, which is slightly more efficient if the atomic_wait()
primitive also efficiently avoids spurious wakeups. I think the tradeoff here is slightly dubious but that's only because I can't think of other examples where disallowing spurious wakeups simplifies user-level code. Would love to see more examples of algorithms that are much expressed much more succintly and/or efficiently with a deterministic/non-spurious wait, if only for my own edification.
@rianhunter I also depend on the absence of spurious wakes in the code that initializes memory when threads are enabled: https://github.com/llvm/llvm-project/blob/adcd02683856c30ba6f349279509acecd90063df/lld/wasm/Writer.cpp#L728-L768. That's not user-level code, but it would be expressible in C. However, that code could be trivially made to handle spurious wakeups by just waiting in a loop.
In general I think that there are some warts in the current JS design. Disallowing spurious wakeups at the JS API level just pushes that checking into the implementation, and it's not obvious that it really simplifies user code all that much -- user code has to be aware of the complicated semantics of waking up in any case. (Though it's probable that fairness is a useful property and it's possible fairness and no-spurious-wakeup are connected.) It's been pointed out that the return value from notify
is of questionable utility, if any (https://github.com/tc39/ecma262/issues/1492). And isLockFree()
should almost certainly never have been part of the proposal (and is not in wasm).
Disallowing spurious wakeups at the JS API level just pushes that checking into the implementation, and it's not obvious that it really simplifies user code all that much
Just for clarity's sake, the reason to allow spurious wakeups in the primitive is that not all systems have an efficient way of preventing spurious wakeups. This is less applicable to big systems and more applicable to small low-level / embedded systems. The impact consideration on user code isn't that allowing spurious wakeups simplifies user code but rather that it doesn't significantly complicate user code (e.g. @tlively's example).
Another modest plea to drop
memory.atomic.wait64
(f.k.a.i64.atomic.wait
) for the sake of allowing a straightforwardfutex()
-based implementation. An additional modest plea to ease the requirements ofmemory.atomic.wait32
to allow afutex()
-based implementation, in particular allowing spurious wakeups. Some rationale:
This will ease implementation for
futex()
-based non-web embeddings
- OpenBSD also has a native futex() calls: https://man.openbsd.org/futex
- Other systems may follow suit, after the addition of C++20 atomic_wait https://en.cppreference.com/w/cpp/atomic/atomic_wait
I am the implementer of libstdc++'s atomic
The standard allows for spurious wakeups and there are no guarantees on ordering.
- This will potentially improve performance for
futex()
-based non-web embeddings- 64-bit wait doesn't seem to be widespread in existing native code (C/C++)
- Interop with JS Atomics doesn't require 64-bit wait
Additionally, allowing spurious wakeups eases low-level implementations of
memory.atomic.wait32
in general, since the sleep condition is only on the initial check of the atomic. Disallowing spurious wakeups makes more sense when the semantics of the wait call are continuously dependent on the value of the atomic, e.g. in C++20atomic_wait()
. Low-level implementations that have no way to prevent spurious wakeups are unable to decide if they should continue to sleep or not after one occurs (even if they weren't notified).I believe the ordering issue can be resolved by placing manual seqcst fences around the futex calls, though syscalls already provide a memory barrier so that would be redundant. @lars-t-hansen could you elaborate what you mean by "ordered wakeup"?
I noticed that the
i64.atomic.wait
instruction can't be translated directly to thefutex
syscall on Linux, since Linux only supports 32-bitfutex
operations. I'm concerned that non-web embeddings on Linux will need to implement their own wait/notify functionality instead of being able to use Linux's directly.Web embeddings will have to implement their own wait/notify functionality anyway to support primitives like
Atomics.waitAsync
Related:
7
StackOverflow: Linux futex for 64-bits