compio-rs / compio

A thread-per-core Rust runtime with IOCP/io_uring/polling.
MIT License
420 stars 37 forks source link

Support of registered file descriptors #28

Closed DXist closed 1 year ago

DXist commented 1 year ago

IO Uring supports registration of file descriptors. Instead of maintaining reference counts for each IO operation Linux constructs user provided indirection array for each thread.

Registration is beneficial for long lived descriptors. Refcounting overhead is relatively significant for small submit batches.

I'm personally interested in driver-level support of registered file descriptors.

I see the following approaches:

  1. define driver specific operations with registered fds. User either prefers a single platform or writes 2 IO data paths for regular and registered file descriptors. IO Uring could expose tokio-uring Fd or FixedFd, platform independent ops code wraps RawFd into Fd.
  2. define common registration interface. For platforms that don't support registration driver could maintain indirection array in userspace.
    1. separate operations for regular/registered fds, no runtime overhead (branching) at operation construction time, more duplicate code
    2. generic interface with enum and 2 newtype wrappers like in tokio-uring - regular/registered enum, macro enum handling, macro param with implicit type that wraps either regular or registered fd - less duplication, but more complex macro-based code, branching runtime overhead
Berrysoft commented 1 year ago

There's already a method attach which does nothing now. We can use that to register fd, maybe with a little change.

Berrysoft commented 1 year ago

I don't have much attention to this feature now, and AFAIK, neither tokio-uring or monoio uses this feature, am I right? Welcome PRs to implement this. It you implement this, I'll review the PR and make sure the IOCP side will not break.

DXist commented 1 year ago

There's already a method attach which does nothing now. We can use that to register fd, maybe with a little change.

For efficiency we need registration/unregistration/update of fixed fd array. User calculates maximum number of fds at compile time. Like LISTEN_SOCKET + MAX_NUMBER_OF_CLIENTS + JOURNAL_FILE_FD.

Berrysoft commented 1 year ago

For efficiency we need registration/unregistration/update of fixed fd array.

Yes, you're right.

If you want to work on this, I suggest making this feature as a must. In other words, all high-level File and Socket should use registered fd, and all low-level OpCode accept only registered fd. That will simplify the APIs. We can give a map between registered index and raw fd, or simply defining our own RawFd as the registered index.

DXist commented 1 year ago

I think I'll focus on building project-specific callback-based runtime using regular fds. If I find time for optimization I could try to use registered fds, registered buffers and measure gains if there will be any.

Anyway which approach would you prefer? I'm personally biased towards 2.i with implementation only for IO uring.

Berrysoft commented 1 year ago

My preference is, TL;DR:

Redefining RawFd as the registered index. Don't use regular fds at all.

However, it's up to the implementation author to choose the approach. 2.i is also OK, I think, but it will make the API of Poller more complex. I would like to keep a simple API.

DXist commented 1 year ago

For efficiency we need registration/unregistration/update of fixed fd array.

Yes, you're right.

If you want to work on this, I suggest making this feature as a must. In other words, all high-level File and Socket should use registered fd, and all low-level OpCode accept only registered fd. That will simplify the APIs. We can give a map between registered index and raw fd, or simply defining our own RawFd as the registered index.

For thread-per-core runtime that makes sense. Newer kernels allow to pass data across thread-local rings, including registered descriptors or sets of descriptors - in case user will want coordinated cross-thread IO work.

DXist commented 1 year ago

I could try to implement indirection array to establish API and continue to use regular fds internally. Later it's possible to implement actual registration/deregistration for IO uring, potentially as a pluggable feature.

I think we can use 1024 fds as a default array size - it's a default limit of open fds for Linux.

Berrysoft commented 1 year ago

compio aims at thread-per-core runtime. I prefer not considering multi-threading and multi-rings now. However, feel free to PR if you would like a thread safe driver.

For multiple drivers working together, I prefer that users don't register an fd everywhere, or move them quite often:). One reason is that on Windows, file handle could only attach one IOCP. (It's tricky to detach it because that needs undocumented API.)

I would like a Driver API that is (nearly) consistent on every supported platform, without requiring users writing #[cfg(target_os = "linux")] here and there. If we provide both regular and registered fd API, the best practice to use the driver is to use regular fd on Windows (because registering needs more space and time to lookup), and use registered fd on Linux (for ref-counting reasons). That will make it painful to use driver.

DXist commented 1 year ago

Yes, common interface is more user friendly. I'm not interested in cross-thread fds either but would like to offload blocking RocksDB IO work into separate thread. I'd prefer to use some notification mechanism like eventfd on Linux to wake up IO thread waiting for uring completions.

Berrysoft commented 1 year ago

Well, if you want notifications, the current solution might be signals. Channels may not work because the driver might be waiting forever. And thank you for your remind, we still need a compatible channel for compio:)

Berrysoft commented 1 year ago

The Poller::post for io-uring could not wake up the IO thread until now. I think we need a simple solution. Maybe eventfd.

DXist commented 1 year ago

What do you think about changing signal implementation to eventfd for Linux?

In this case sync_queue won't be needed for Linux target.

DXist commented 1 year ago

Well, if you want notifications, the current solution might be signals. Channels may not work because the driver might be waiting forever. And thank you for your remind, we still need a compatible channel for compio:)

This multi-thread channel is fast according to the benchmark.

It uses async-event for notifications. I'd prefer to make it configurable to support non async runtimes. For example, notify via eventfd.

Berrysoft commented 1 year ago

What do you think about changing signal implementation to eventfd for Linux?

Yes, that's possible. Actually signal itself will interrupt io-uring and make it return EINTR.

However I'm thinking about Poller::post. It is complicated if we want it also interrupt the thread.

Channel is another problem. I think there should be an occassion that, when the runtime is waiting io-uring forever, a message comes from the channel. However, the thread is still stuck.

DXist commented 1 year ago

What do you think about changing signal implementation to eventfd for Linux?

Yes, that's possible. Actually signal itself will interrupt io-uring and make it return EINTR.

However I'm thinking about Poller::post. It is complicated if we want it also interrupt the thread.

Channel is another problem. I think there should be an occassion that, when the runtime is waiting io-uring forever, a message comes from the channel. However, the thread is still stuck.

It's possible to mask signals in the IO thread and have a dedicated thread for background work. In this case it's possible handle signals only when IO is driven.

DXist commented 1 year ago

I've pushed initial interface for registered fds - https://github.com/Berrysoft/compio/pull/30/files

George-Miao commented 1 year ago

Close due to lack of activity. Re-open if furthur progress is made.