apple / swift-nio

Event-driven network application framework for high performance protocol servers & clients, non-blocking.
https://swiftpackageindex.com/apple/swift-nio/documentation
Apache License 2.0
7.97k stars 652 forks source link

Refactor I/O memory ownership for asynchronous backends #1805

Open hassila opened 3 years ago

hassila commented 3 years ago

Trying to start a discussion and capture requirements / next steps, based on the experience from #1761 and related discussions, a prerequisite for supporting truly asynchronous backends like io_uring fully (i.e. using async send/recv) requires a refactoring.

To try to recap, the root issue is that memory ownership of the IO operations is currently owned by the channels, but this does not fit the model of e.g. io_uring where (simplified) we would register a set of memory buffers with the kernel, issue a bunch of read I/O directly for (e.g.) sockets, and when receiving the read result eventually, the kernel will tell us which buffer it put the result in. We then would re-register this buffer for reuse after we have processed the result.

This means that the ownership of the read memory buffers would preferably be managed on e.g. an EL basis instead of per channel.

Similarly, the lifespan and ownership for write buffers needs to be adjusted, as the kernel won't have copied the user payload until we get a asynchronous completion event (unlike now, when we know synchronously that the write has completed) - so a decoupling of the write operation and the reclaiming of that memory buffer is needed.

What we then ideally would have, is a new truly async EL, which will schedule all I/O directly for all channels depending on expressed interest (no more epoll/kqeue, no more changing of registrations externally towards the kernel - we can just choose when to schedule the actual read/write/accept). The EL would also need to notify channels when their I/O has completed for e.g write buffer management.

Lukasa commented 3 years ago

This seems like an appropriate design. Right now this is already done in some cases. For example, we have a number of buffers for vector operations stored on the selectable event loop.

The biggest limiting factor here is that channels currently have some degree of control over how they allocate memory for reads by way of the RecvByteBufferAllocator. Working out whether we can persist this design is going to be important, as it would be valuable if we can do it.

As for writes, we thankfully can do that already. ByteBuffer has withVeryUnsafeBytesWithStorageManagement that allows us to hold an opaque object that will manage the lifetime of the buffer. This will allow us to "pin" the buffer in memory until the write completes. This interacts nicely with CoW and every other property of ByteBuffer, so we can use that effectively.

hassila commented 3 years ago

I think that channels could get similar functionality to RecvByteBufferAllocator automatically, as we would see if the amount of data read matches the size of the buffer, and could choose a larger buffer pool for the next read in that case.

If there is a reason for delegating this to the channel instead of running with full automation, it would be a simple solution if a channel can signal the desired amount to read and update that as required (and the EL would use the desired amount from the channel to pick the proper buffer pool). Fundamentally instead of specifying an actual buffer allocator, one instead specifies a "preferred buffer size provider" which just returns desired minimum size for the next read.

Lukasa commented 3 years ago

I think that channels could get similar functionality to RecvByteBufferAllocator automatically, as we would see if the amount of data read matches the size of the buffer, and could choose a larger buffer pool for the next read in that case.

I want to distinguish between RecvByteBufferAllocator (a protocol that allows customisation) and AdaptiveRecvByteBufferAllocator, the default implementation for TCP channels that does automatic resizing. Specifically what is important here is that TCP is not forced to use AdaptiveRecvByteBufferAllocator but may choose any allocation strategy they wish.

hassila commented 3 years ago

That's fine, then we could just do the "preferred buffer size provider" (better name please :-) - it would provide the same amount of control has the current buffer allocators, it is just that the actual allocation and ownership of the memory would be for the EL, but the channel can pick an "buffer size provider" that has the same logic as the current buffer allocators fundamentally. Then of course, the EL might make minor adjustments to fit buffer pool sizes (analogous to how a malloc implementation might return a slightly "too big" allocation to fit its internal implementation).

hassila commented 3 years ago

Something like RecvByteBufferSizer (protocol) and AdaptiveRecvByteBufferSizer (adaptive implementation).

Lukasa commented 3 years ago

Yup, seems like a reasonable design change to me.