EventLoop(Future): clarify memory ordering / memory consistency

Now that Swift has a memory consistency model, we should clarify how EventLoops and EventLoopFutures tie into that.

For example, is the following program correct?

let group = MultiThreadedEventLoopGroup(numberOfThreads: 3)
defer { try! group.syncShutdownGracefully() }

let el1 = group.next()
let el2 = group.next()
let el3 = group.next()

// on main thread
var counter = 0
counter += 1

el1.makeSucceededFuture(()).flatMap {
    // on EL1's thread
    counter += 1

    return el2.makeSucceededFuture(()).flatMap {
        // on EL2's thread
        counter += 1
        return el3.makeSucceededFuture(())
    }
}.flatMap {
    // on EL1's thread
    counter += 1
}.wait()

// on main thread
precondition(counter == 4)

Clearly it reads and writes counter from about 3 different threads without any explicit synchronisation. However, the EventLoopFutures establish a clear before/after memory ordering and in the current implementations of SelectableEventLoop (because if the locks which are full mem barriers) and NIOTSEventLoop (because of DispatchQueue's guarantees) this is actually totally safe. But it's never stated anywhere that this is in fact safe. So should the programmer assume this is unsafe and resort to a NIOAtomic<Int> or a Lock? I don't think so.

I think we should add to the docs of EventLoop and EventLoopFuture a few guarantees regarding memory orders. Most importantly:

futures: call the callbacks one after the other, with full barriers
ELs: calls to .execute/.schedule/.submit/.wait()... have full barriers and are ordered w.r.t. other such calls (and ofc future callbacks which use these underlying methods)

Hmm, so while I agree with you that we should document things, I think EventLoopFuture does not actually express any "happens before" relationship w.r.t its callbacks. EventLoopFuture's happens-before relationship is derived solely from the EventLoop. This can be observed with EmbeddedEventLoop, and indeed if any of the above event loops were replaced with an EmbeddedEventLoop then the safety guarantees of the program evaporate.

To this end, I think we need to explore two related concepts:

What the memory order is for a thread safe EventLoop implementation: that is, the appropriate barriers and edges that an implementation must provide. Concretely, I believe there are only three actual guarantees required here:
- a call to EventLoop.{submit,execute,scheduleTask} happens before the closure passed to that function begins executing (this implies a full barrier associated with these operations).
- a call to any closure passed to any of the above submission functions returns before_ any call to a closure submitted by the same thread to the same loop later in program execution begins, if the deadlines are the same
- a call to any closure A passed to any of the above submission functions returns before_ any call to a closure B submitted by any part of the program begins, if closure B was submitted after the beginning of the execution of closure A.
From these guarantees the EventLoopFuture guarantee is built:
- each call to an EventLoopFuture callback happens before (i.e. returns before) any call to an EventLoopFuture callback enqueued later by the same thread. This is achieved by the fact that EventLoopFuture callbacks are added by way of submitting work to the owning event loop.

I believe that this also derives an important property of event loop task execution, which is that it is linearizable. Enqueuing a task on an event loop happens before the execution of the task body. The execution of a task happens before the execution of any other task submitted to the loop. As a result, tasks submitted to an event loop form a history that meets the requirements of strict serializability: each task submitted to an EventLoop task queue completes before the next one begins, and submissions that have a happens-before relationship to each other persist that happens-before relationship into the task queue (assuming deadlines are not a factor).

There is some more detail here to flesh out around the role deadlines play (concretely, that tasks submitted with equal deadlines but without a "happens before" relationship between them are unordered, that execute/submit both take a deadline of now(), etc.), but this is roughly the right shape.

I suggest noting things this way because it allows us to more concretely talk about EmbeddedEventLoop. EmbeddedEventLoop explicitly breaks the rules of event loops: it does not provide our happens-before relationships at all. execute/submit and friends meet part 1 and part 2, but they miserably fail at part 3: calling execute from multiple threads allows the closure bodies to execute simultaneously.

We allow this for EEL because its purpose is as a verified mock, not as a complete loop, but it's important that we acknowledge the constraint. We should also consider whether it's worth considering making EEL thread-safe but controllable, to avoid this risk, though that's a harder thing to do.

A quick note: the above sketch is the guarantee we should be requiring loops to make, but currently it's not the one they actually make. In particular, we have a slightly weaker version of rule 2 for our existing loops, as two tasks submitted with the same deadline are unordered with respect to each other (i.e. they may not execute in program order). That's covered by #1541, and we should fix it as part of this work, or @aphyr will laugh at us.

An ELF does guarantee ordering and therefore happens before:

A given ELF runs one callback after the other (it doesn't actually currently ask the EL to do, it just loops itself. But even if it did that, we have happens-before because any EL worthy its name will only run one thing at a time)
A given ELF always runs on the very same EL
A future chain executes one combinator after the next

The above example would work just fine with EmbeddedEventLoop it'd just never thread-switch because it's all embedded into the main thread.

Of course, for everything to make sense, we also require the EL to actually be thread-safe (which EmbeddedEL isn't so you mustn't thread-switch if you use it).

So yeah, your first guarantee (about ELs) sounds about right. We should use the same language as the Swift memory model/Concurrency to not confuse problems.

For the second guarantee, we should remove the word "thread". We should require in the EventLoop protocol that any given EL can obviously only run one thing at a time although for me that's kinda implied already. Nevertheless, explicit over implicit.

You're right that the ELF does provide a happens-before relationship to its callbacks that is conceptually somewhat separate from the happens-before relationship provided by event loop submission, good spot. That's probably rule 4 then.

Agree that the second guarantee shouldn't use the word "thread". Here's a proposed rewording, which also subsumes rule 3:

a call to any closure A passed to any of the above submission functions returns before any call to a closure B submitted later in program execution (that is, where the submission of closure A happens before submission of closure B) begins executing, if the deadlines are the same

We should require in the EventLoop protocol that any given EL can obviously only run one thing at a time although for me that's kinda implied already.

I believe this is what rule 3 tried to encapsulate, but as I re-read it I realise that it didn't. So here's a new version of rule 3

all calls to work items enqueued on an event loop (including I/O operations) must form a strict serializable history: that is, work items must execute atomically (only one work item may execute at a time, and runs until completion), and they execute in deadline order, where tasks that have the same deadline must execute in the order they were submitted to the event loop if such an order exists (as defined by the Swift memory model and the SwiftNIO memory model)

Not as concise, but probably more effective.

Sounds good.

Although

where tasks that have the same deadline must execute in the order they were submitted to the event loop

we don't do that right now. If you make a NIODeadline and you schedule multiple things to that deadline, they'll run in random order. Two calls to execute will run in the correct order but this isn't actually programmed, it relies on the TSC always advancing and therefore producing a new deadline.

that is, work items must execute atomically (only one work item may execute at a time, and runs until completion)

I agree with this. But we should make sure that it's not confusing. For example if you do this

el.execute {
    // BEGIN: work item 1

    let p = el.makePromise(of: Void.self)
    p.futureResult.whenComplete { _ in
         // this will within work item 1. Is this clear to the user that this is _not_ its own work item?
    }
    p.succeed(())

   // END: work item 1
}

@weissi Yup, I already linked to the relevant issue we have to fix above.

As to your second point, I think we need to write the rules in such a way that they can be meaningfully applied to the program. In this case, users can only know whether things are within the same work item based on examining their documentation. Arbitrary closures may or may not be part of the same work item, and it's not possible to know that for all types ahead of time without studying their documentation. In this case, EventLoopFuture may need clearer docs about when exactly future callbacks execute. I don't think it impacts the memory ordering documentation though.

apple / swift-nio

EventLoop(Future): clarify memory ordering / memory consistency #1909