Closed pyhd closed 8 months ago
Can you please be a bit more specific? What's being run, and what do you mean by "bug functions"? It's easier if we don't have to guess as what you mean, just be explicit in what you think is wrong (eg "expected behavior").
As always, a reproducer is worth a thousand words as it covers pretty much everything.
Sorry for my words.
io_uring_prep_recv_multishot(main_ring);
while() {
io_uring_wait_cqe(main_ring);
while (io_uring_peek_cqe() == 0) {
reap & io_uring_prep_sendmsg_zc(send_ring);
io_uring_cqe_seen(main_ring);
}
nr = io_uring_sq_ready(send_ring);
io_uring_submit_and_wait(send_ring, nr);
io_uring_cq_advance(send_ring, nr);
}
The problem is the lower thoughput with sendmsg_zc. Is it expected that Is it expected to execute __io_cqring_overflow_flush
comes before io_submit_sqes
, and __io_submit_flush_completions
before io_issue_sqe
?io_submit_sqes
together with __io_cqring_overflow_flush
, as well as io_issue_sqe
together with io_req_cqe_overflow
? I did not find such functions with sendmsg
, so I just suspect them.
__io_cqring_overflow_flush
is indeed about overflowed CQEs and it's expensive. So if you see it the solution would be to size CQ appropriately.
In terms of overflows multishot receives are usually more of a hazard because sends are more predictable, i.e. 2 CQEs per send. And I have no clue how GSO is at play here, as it should be reducing the total number of CQES.
In terms of overflows multishot receives are usually more of a hazard because sends are more predictable, i.e. 2 CQEs per send. And I have no clue how GSO is at play here, as it should be reducing the total number of CQES.
I found the culprit:
After io_uring_submit_and_wait
, io_uring_cq_ready
was larger than submitted sqe numbers (i.e. io_uring_sq_ready
). In addition, none of their cqe->res was negative.
Edit: As I mentioned in the pseudo code, the sendmsg_zc
was submitted to a dedicated ring.
if (!(cqe->flags & IORING_CQE_F_NOTIF)) {
if (cqe->flags & IORING_CQE_F_MORE)
nr_cqes++;
}
Fixed. I just missed the man page about IORING_OP_SEND_ZC
.
Anyway really appreciate for your help.
Experimented it with different CQE sizes, although large enough, overflow still flooded. I guess that's why the thoughput was lower than the regular copy method.
Setup:
IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_COOP_TASKRUN | IORING_SETUP_DEFER_TASKRUN
io_uring_register_ring_fd
andio_uring_register_files
4 UDP GSO buffers(64K each)Bug functions:
__io_cqring_overflow_flush
beforeio_submit_sqes
__io_submit_flush_completions
->io_req_cqe_overflow
beforeio_issue_sqe