dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.47k stars 4.76k forks source link

Use io_uring instead of epoll when supported #753

Open lpereira opened 4 years ago

lpereira commented 4 years ago

io_uring is a new method to perform efficient I/O on Linux systems. It provides a completion model (rather than a readiness model), similar to what IOCP on Windows provides, and unlike the standard poll-like interfaces, it can be used to request I/O from regular files as well (and, unlike the old/broken AIO in Linux, it doesn't require files to be opened in O_DIRECT mode).

It is a recent development, but reports of it being used by servers are very promising, often yielding gains exceeding 2 or 4x in throughput. Here's a talk by its main author with details, including benchmarks.

In addition to I/O (read/write/poll), it's also possible to handle connections (accept/connect) and a bunch of other things.

It should be possible to enable this and have both io_uring and epoll (as a fallback) in pal_networking.

am11 commented 4 years ago

Going by the pdf, it seems that polled IO might be the most suited option for PAL networking, because it is efficient, closer to epoll implementation and does not require elevated privileges (like 'kernel side polling' option). Few questions:

lpereira commented 4 years ago

I'd say depend on liburing. Doing stuff by hand is possible but we would be essentially replicating it inside the runtime; better to stick with something that's been debugged and tested already. I don't know how much they care about API and ABI compatibility at this point, so using it as a shim might not be a good idea; maybe using a git submodule?

As for the minimum kernel requirement: for io_uring, we should support 5.4+ only, falling back to epoll on older kernel versions. There were many improvements in the 5.5 series too, so eventually we might even bump the requirements if we end up taking the advantage of these features, just to simplify how we implement stuff -- for instance, async file I/O and not only sockets. (This kernel is still not common in most distributions but would be nice if the performance just appeared out of the blue after a kernel upgrade.)

damageboy commented 4 years ago

Possible dupe of:

https://github.com/dotnet/coreclr/issues/24441

This situation with the issues not yet ported is starting to generate noise...

lpereira commented 4 years ago

Indeed it's a dupe, @damageboy. (I'll keep this issue open here as it might be easier to reference it and it's unlikely a lot of folks will keep a close eye on the coreclr repo after the consolidation.)

damageboy commented 4 years ago

@lpereira Aren't the issues moving? Has anything changed?

lpereira commented 4 years ago

@lpereira Aren't the issues moving? Has anything changed?

They're moving, but it should take a month or so. I can close this one once the move is complete (can't easily mark as dupe in different repos.)

am11 commented 4 years ago

It should be possible to enable this and have both io_uring and epoll (as a fallback) in pal_networking.

i think pal_networking, coming from corefx, deserves a separate issue as there is a defined/finite surface area which is currently using epoll where io_uring can be incorporated. It can be tracked here.

coreclr issue is a broader discussion on how to make use of io_uring in variety of scenarios, which currently is done in coreclr's pal without using epoll and friends in kernel-agnostic manner, afaict.

lpereira commented 4 years ago

Another thing I think we can use io_uring -- maybe not right now, but we could contribute a patch to the Linux kernel -- is to implement WaitForMultipleObjectsEx() using futexes directly, and have a command in io_uring to perform operations in multiple futexes at the same time.

isilence commented 4 years ago

Another thing I think we can use io_uring -- maybe not right now, but we could contribute a patch to the Linux kernel -- is to implement WaitForMultipleObjectsEx() using futexes directly, and have a command in io_uring to perform operations in multiple futexes at the same time.

@lpereira, I'm speculating, but would a new futex opcode with already implemented linked commands and timeouts suffice you? Someone already mentioned supporting futex(2) axboe/liburing#39

benaadams commented 4 years ago

epoll bare minimum echo server

50 clients, running 512 bytes, 60 sec.

Speed: 189185 request/sec, 189185 response/sec
Requests: 11351122
Responses: 11351122

io_uring bare minimum echo server (Linux 5.4 needed, lower versions don't return the right amount of bytes read from io_uring_prep_readv in cqe->res.) https://github.com/frevib/io_uring-echo-server

Benchmarking: localhost:5555
50 clients, running 512 bytes, 60 sec.

Speed: 368368 request/sec, 368368 response/sec
Requests: 22102112
Responses: 22102110
isilence commented 4 years ago

The difference looks good, even though it can do even better. E.g. io_uring allows registered buffers and fds, supports IORING_OP_ACCEPT, etc. (or get rid of callocs in the loop...)

benaadams commented 4 years ago

edit removed links as author has decided on GPL v3.0

frevib commented 4 years ago

@benaadams changed it to MIT, sorry for the inconvenience. @isilence it definitely needs some optimizations and I think there are some tiny bugs. If you want/like/have time to issue a PR, I’m happy to merge.

benaadams commented 4 years ago

edit author changed to MIT so put link back https://github.com/frevib/io_uring-echo-server :)

It's a networking example using liburing which is LGPL so can be linked to (though not derived from for MIT; so don't look at the source for liburing in case we do our own implementation on io_uring which must be clean and not derived from LGPL).

Though I don't know the dotnet policy on linking to LGPL and whether its allowed? /cc @jkotas

There's a very detailed document from the author of liburing @axboe who is also one of the authors of io_uring https://kernel.dk/io_uring.pdf on the motivation for io_uring and what it achieves, as well as how to use it (including considerations around memory barriers).

That then leads to the motivations for liburing and how to use that (it simplifies all the boilerplate setup and tear down for io_uring and handles all the memory barriers etc)

To quote

With the inner details of the io_uring out of the way, you'll now be relieved to learn that there's a simpler way to do much of the above. The liburing library serves two purposes:

  • Remove the need for boiler plate code for setup of an io_uring instance.
  • Provide a simplified API for basic use cases.

Also a LWN.net article about io_uring

am11 commented 4 years ago

As noted above, I think at least for the usecase in pal_networking.c in this repository, where implementation is currently using epoll, does not require link to liburing (a convenience library). It is more work, yes, but IMO worth it for dotnet runtime. Taking a dependency on another runtime library comes with cost for packaging as well. For example, liburing is not readily available in Alpine Linux package and many other package management systems, see Absent in repositories.

lpereira commented 4 years ago

Notwithstanding library availability -- because we could use git submodules, for instance, and statically link with liburing -- there's a bigger issue: linking with LGPL would require us to also distribute .o files in addition to the binaries for .NET.

So I agree that it would be better to reimplement what liburing does; it's a thin wrapper around the kernel API. It mostly reduces a lot of the boilerplate necessary to map the queues and provides a bunch of auxiliary functions and whatnot.

If we're unsure how to use the API, though, it's possible to read from other implementations; for instance, there's a dual-licensed Apache 2/MIT library for Rust that could be used for studying purposes.

benaadams commented 4 years ago

Also the libuv PR for io_uring could be something to look at https://github.com/libuv/libuv/pull/2322 (libuv uses an joyent attribution licence); where they also state they can't look at the source for liburing as its LGPL https://github.com/libuv/libuv/pull/2322#issuecomment-500455185

axboe commented 4 years ago

FWIW, I'd be willing to change the liburing license to dual MIT/GPL. There's really nothing fancy in the library, it's mostly just helpers, and a simplified interface should the application wish to use that. But it'd be a shame to have some of this code duplicated just because of licensing constraints.

lpereira commented 4 years ago

@axboe That would be appreciated; it would indeed help a lot with io_uring adoption, given that GPL family of licenses aren't, unfortunately (in my personal opinion), that popular these days.

axboe commented 4 years ago

I like GPL for applications, and I still use it, but it makes less sense for libraries. And in particular for something like liburing, which isn't really a lot of smarts, it's mostly just setup and helper code. I'm doing some due diligence by emailing folks that have more than a few commits in liburing, then I'll change it provided nobody objects (can't see why they would).

axboe commented 4 years ago

I'm doing some due diligence by emailing folks that have more than a few commits in liburing, then I'll change it provided nobody objects (can't see why they would).

This has now been done.

lpereira commented 4 years ago

For the record, here's an ASP.NET transport by @tkp1n that reimplements liburing in C#: https://github.com/tkp1n/IoUring

isilence commented 4 years ago

@lpereira, I'm speculating, but would a new futex opcode with already implemented linked commands and timeouts suffice you? Someone already mentioned supporting futex(2) axboe/liburing#39

Going back to the ignored question... Guys, what's your use case and what would you need to integrate io_uring? Support for futex(2)? Something else?

lpereira commented 4 years ago

@lpereira, I'm speculating, but would a new futex opcode with already implemented linked commands and timeouts suffice you? Someone already mentioned supporting futex(2) axboe/liburing#39

Going back to the ignored question... Guys, what's your use case and what would you need to integrate io_uring? Support for futex(2)? Something else?

Yeah, futex support for io_uring would be very welcome, especially if it had the FUTEX_WAIT_MULTIPLE command that was proposed a while ago (the use case is for Wine's implementation of WaitForMultipleObjects(), which is currently using polled eventfds, but we also have an implementation in our PAL that could benefit from this.)

isilence commented 4 years ago

Yeah, futex support for io_uring would be very welcome, especially if it had the FUTEX_WAIT_MULTIPLE command that was proposed a while ago (the use case is for Wine's implementation of WaitForMultipleObjects(), which is currently using polled eventfds, but we also have an implementation in our PAL that could benefit from this.)

Great, I'll try to take a look. I'm concerned about not having fast-path in-userspace locking, but it should be any better than eventfd + epoll. I haven't seen FUTEX_WAIT_MULTIPLE, but will need it to be merged first.

lpereira commented 4 years ago

This article about using io_uring in modern C++ (with coroutines et al) is a pretty good read and gives some API insights, too: https://cor3ntin.github.io/posts/iouring/

benaadams commented 4 years ago

lwn article The rapid growth of io_uring

antonfirsov commented 4 years ago

A general update:

All prototyping is being done on https://github.com/tmds/Tmds.LinuxAsync, together with other experiments from #14304 . We hope to see some numbers soon. After that we can think about the productization of the changes.

ericsampson commented 4 years ago

Is it possible to dupe-close one of these two issues, so that there is one main tracking issue? https://github.com/dotnet/runtime/issues/12650

ShreyasJejurkar commented 1 year ago

Hopefully this will be considered for 9.0

ReubenBond commented 9 months ago

Nice docs on io_uring for anyone interested in this: https://nick-black.com/dankwiki/index.php/Io_uring