compio-rs / compio

A thread-per-core Rust runtime with IOCP/io_uring/polling.
MIT License
420 stars 37 forks source link

Blocking accept on linux and submit/complete operations batching #41

Closed DXist closed 1 year ago

DXist commented 1 year ago
DXist commented 1 year ago

Can Windows driver use VecDeque buffer for completed entries to unify ring buffer semantics across all platforms?

Then poll could just wait for at least one entry completed or timeout is passed. IOCP could use internal preallocated buffer instead of allocation on each poll. The runtime could update the op_runtime by all completed entries instead of limiting it by fixed 16 entries.

Berrysoft commented 1 year ago

If you are not satisfied with the size of 16, we can try extending it to 1024:)

DXist commented 1 year ago

If you are not satisfied with the size of 16, we can try extending it to 1024:)

I'm not satisfied with entries parameter in poll method not the size itself.

Berrysoft commented 1 year ago

"Blocking accept on Linux" and batching are two features. Please split them into two PRs.

About the blocking accept, @George-Miao gives an idea about adding the flags.

About batching: when we're talking batching, I think we are talking about the ability submitting/getting multiple entries in one call. Poller::poll is such a method that could submit all operations, and get multiple completed entries in one call. However, in poll for io-uring, the submission queue is copied to the submission ring one-by-one, which is inefficient. We do have more efficient way to copy them. That's the part I agree.

However, I think the completed entries is already returned in a batched way. A user allocates (maybe on the stack) an uninitialized array, and the driver tries to fill it. The entry was in a VecDeque before, and I also didn't implement the most efficient way to copy them to the user provided slice. However, you're talking about replacing them to a VecDeque iterator. I don't think that would be faster.

DXist commented 1 year ago

I've changed the implementation to return draining iterator because io::Result is not Clone.

DXist commented 1 year ago

"Blocking accept on Linux" and batching are two features. Please split them into two PRs.

About the blocking accept, @George-Miao gives an idea about adding the flags.

About batching: when we're talking batching, I think we are talking about the ability submitting/getting multiple entries in one call. Poller::poll is such a method that could submit all operations, and get multiple completed entries in one call. However, in poll for io-uring, the submission queue is copied to the submission ring one-by-one, which is inefficient. We do have more efficient way to copy them. That's the part I agree.

Yes, on submission we can use push_multiple. On completion side I changed implementation to use extend instead of individual pushes.

However, I think the completed entries is already returned in a batched way. A user allocates (maybe on the stack) an uninitialized array, and the driver tries to fill it. The entry was in a VecDeque before, and I also didn't implement the most efficient way to copy them to the user provided slice. However, you're talking about replacing them to a VecDeque iterator. I don't think that would be faster.

With draining iterator we also do implicit copy of one element at a time.

Berrysoft commented 1 year ago

The optimization about push_multiple and extend is fine. But you haven't give a fair reason to change the signature of poll. And by the way, I also don't agree with adding another cqueue to IOCP.

DXist commented 1 year ago

I think it's more convenient to process all completed items then to get partially filled entry array and use unsafe to assume initialized array prefix.

Berrysoft commented 1 year ago

Now you start to claim convenience. I think it's OK if this PR is about wrapping the poll method with an iterator. But the current one isn't fine to me.

DXist commented 1 year ago

What if we remove completion queues from the drivers and require a runtime or user to provide mutable VecDeqs on poll?

Then we will eliminate an extra intermediate copy for all drivers. The queue could even contain previously unprocessed entries - poll could extend into it.

Berrysoft commented 1 year ago

I'm thinking about a possible prettier way to improve this. Let me have some time to implement that...

DXist commented 1 year ago

I'm thinking about a possible prettier way to improve this. Let me have some time to implement that...

Thank you!

Does it make sense to use poll term for completion-based IO driver?

I think poll applies to async runtime and Rust futures interface.

But on driver level it makes sense to expose submit and completions methods because client prepares operations to submit in a batch & then optionally wait them to process a batch of completed operations.

Berrysoft commented 1 year ago

poll is only a name:)

Berrysoft commented 1 year ago

Continue in #43