axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.86k stars 402 forks source link

liburing giving similar throughput as select/epoll #414

Closed MarcoF1 closed 3 weeks ago

MarcoF1 commented 3 years ago

I am currently working on this branch adding io_uring as an option to receive data for a linux networking benchmark tool running n Ubuntu with Linux Kernel 5.13.

When I run the benchmark test using liburing or select, I get similar throughput levels and am unsure why that would be since liburing is asynchronous. Here is the file where I am using liburing to receive data. Not sure what I am missing to get the performance boost that io_uring offers.

MarcoF1 commented 3 years ago

I have tried using register buffers and register files, but they don't seem to help much with performance. As well as enabling the polling mode, but that only seems to increase CPU usage

HuanjunXie commented 3 years ago

I have tried using register buffers and register files, but they don't seem to help much with performance. As well as enabling the polling mode, but that only seems to increase CPU usage

In your code tcpstream.c, SQPOLL and IOPOLL features are not used, you can try them.

mohsenomidi commented 3 years ago

It seems the performance is not guaranteed yet.

Similar issue here #189

performance issue

isilence commented 3 years ago

@mohsenomidi, I always had problems with that test. Apart from not testing anything useful, the numbers are least strange... Just run it on my laptop, 5.14. The number of iteration is increased x100 for pipes=100, otherwise under 1s.

# iter count = 1000000,
> taskset -c 3 ./io_uring 100
Pipes: 100
Time: 42.052680
> taskset -c 3 ./epoll 100
Pipes: 100
Time: 86.737568

# iter count = 10000,
> taskset -c 3 ./epoll 500
Pipes: 500
Time: 5.386944 # edited from 48.026645
> taskset -c 3 ./io_uring 500
Pipes: 500
Time: 2.106056

edit: for pipes=500, that's me screwing the test, so it's 5.3 vs 2.1

isilence commented 3 years ago

p.s. the second result (42 vs 2) looks weird, may be the test being buggy.

isilence commented 3 years ago

I have tried using register buffers and register files, but they don't seem to help much with performance. As well as enabling the polling mode, but that only seems to increase CPU usage

In your code tcpstream.c, SQPOLL and IOPOLL features are not used, you can try them.

Sockets don't support IOPOLL. Neither SQPOLL should be needed and may complicate the code.

isilence commented 3 years ago

I have tried using register buffers and register files, but they don't seem to help much with performance. As well as enabling the polling mode, but that only seems to increase CPU usage

@MarcoF1, good work. You don't use registered files unless sqe->flags |= IOSQE_FIXED_FILE is set.

isilence commented 3 years ago

@MarcoF1, it needs to be investigated, but my guess is that you don't do enough of batching. I.e. submitting several requests at once, and if I'm reading your code correctly it won't be of much difference. For instance, each io_uring_submit() ends up doing a syscall per request.

What is the magnitude of the difference? Can you share some numbers? Also, I guess there are N parallel clients running and each using a io_uring instance. Right?