EventLoop: task list: investigate what's missing in Swift in 2021 to switch to a MPSC queue

apple / swift-nio

Event-driven network application framework for high performance protocol servers & clients, non-blocking.

https://swiftpackageindex.com/apple/swift-nio/documentation

Apache License 2.0

7.97k stars 652 forks source link

EventLoop: task list: investigate what's missing in Swift in 2021 to switch to a MPSC queue #1829

Open weissi opened 3 years ago

weissi commented 3 years ago

Currently, our SelectableEventLoop task list is backed by a priority queue, synchronised with locks. In many cases we should be able to dramatically speed enqueuing tasks from other threads up when it matters the most: when all EventLoops are constantly spinning and almost never parked.

Switching the task list to a lockless MPSC queue (multi producer, single consumer queue) should help a lot here.

The swift-atomics package may actually be all we need for this. (but a dependency on swift-atomics may require a major NIO version).

On the other hand, maybe we could trial an implementation in a separate repository for a Channel/EventLoop implementation that's backed by io_uring I/O which is fully asynchronous.

hassila commented 3 years ago

If looking at implementing a MPSC, just wanted to share this paper that I had on the radar. It seems to scale very nicely on the producer side due to its design.

hassila commented 3 years ago

(just an implementation note if it is being considered, I would probably relax the strict FIFO requirement for the queue overall and only do a single scan - if we believe per-thread ordering is good enough - that would speed up the dequeueing performance by half or so)

Lukasa commented 3 years ago

Per-thread ordering is absolutely sufficient.

Lukasa commented 3 years ago

So, a note: the above proposed paper is unlikely to be a good fit. In particular, it requires one queue per enqueueing thread, but NIO allows arbitrary numbers of threads to enqueue work onto its event loops. Fixing this would require an additional concurrent data structure that allows us to add new thread lists to the queue and risks really nasty performance penalties in cases where a large number of threads come into existence and die.