YoSTEALTH / Liburing

Liburing is Python + Cython wrapper around C Liburing, which is a helper to setup and tear-down io_uring instances.
https://pypi.org/project/liburing/
Creative Commons Zero v1.0 Universal
95 stars 3 forks source link

Polling mode #8

Closed qweeze closed 3 years ago

qweeze commented 3 years ago

Hello! I've been trying to make example code work with IORING_SETUP_SQPOLL but something goes wrong. Could you have a look pls?

from liburing import *

ring = io_uring()
cqes = io_uring_cqes()
io_uring_queue_init(8, ring, IORING_SETUP_SQPOLL)

path = os.path.abspath('/tmp/liburing-test-file.txt').encode()
sqe = io_uring_get_sqe(ring)
io_uring_prep_openat(sqe, -1, path, os.O_CREAT | os.O_RDWR, 0o660)
io_uring_submit(ring)

io_uring_wait_cqe(ring, cqes)
cqe = cqes[0]
result = trap_error(cqe.res)  # fails with OSError: [Errno 22] Invalid argument
YoSTEALTH commented 3 years ago

I have tried this before and gotten the same errors. I know IORING_SETUP_SQPOLL flag does work since it initlizes properly here https://github.com/YoSTEALTH/Liburing/blob/master/test/setup_test.py#L32

I would say using IORING_SETUP_SQPOLL is considered "advanced" usage, where you are ready to tweak sq_thread_idle in io_uring_params and meet few other requirements, unfortunately I haven't had the time to look further into this. For now I have stuck to using io_uring_queue_init(8, ring, 0) which works the best!

You can read more about it https://unixism.net/loti/search.html?q=IORING_SETUP_SQPOLL

qweeze commented 3 years ago

Thanks! I managed to get this to work eventually but it was super unintuitive :slightly_smiling_face:

ring = io_uring()
cqes = io_uring_cqes()
params = io_uring_params()
params.flags |= IORING_SETUP_SQPOLL
params.sq_thread_idle = 2000
io_uring_queue_init_params(8, ring, params)

# check that kernel thread is running
assert os.system('ps --ppid 2 | grep io_uring-sq > /dev/null') == 0

path = os.path.abspath('hello.txt').encode()
fd = os.open(path, os.O_CREAT | os.O_RDWR | os.O_TRUNC, 0o644)

# we need to register files first
files_ = files(fd)
io_uring_register_files(ring, files_, len(files_))

sqe = io_uring_get_sqe(ring)
iovecs = iovec(b'hello world')

# here we need to pass not the file descriptor, but its index in the registered files vector
file_idx = 0
io_uring_prep_write(sqe, file_idx, iovecs[0].iov_base, iovecs[0].iov_len, 0)
sqe.flags |= IOSQE_FIXED_FILE
io_uring_submit(ring)

io_uring_wait_cqe(ring, cqes)
cqe = cqes[0]
trap_error(cqe.res)
YoSTEALTH commented 3 years ago

I managed to get this to work eventually but it was super unintuitive slightly_smiling_face

Nice, sure is.

Do you have a specific use-case where IORING_SETUP_SQPOLL is needed?

qweeze commented 3 years ago

No, I'm just playing around with the library. I wrote a simple webserver using your wrapper and asyncio and was curious if sq-polling could increase performance. But now (if I got it right) it doesn't seem to be a good use-case for IORING_SETUP_SQPOLL since it requires to add a client socket descriptor via io_uring_register_files_update each time new client connects

YoSTEALTH commented 3 years ago

Yes, having to use io_uring_register_files_update every time you receive a new client seems slow. As far as I can tell io_uring_register_files* goal is to mainly register bunch of fds ahead of time and keep reusing it over and over.

For a simple echo benchmark I tried

asyncio (not using liburing)

>>> siege -b -c100 -r100 http://0.0.0.0:9000/
Transactions:               10000 hits
Availability:               100.00 %
Elapsed time:               3.64 secs
Data transferred:           1.25 MB
Response time:              0.04 secs
Transaction rate:           2747.25 trans/sec
Throughput:                 0.34 MB/sec
Concurrency:                99.46
Successful transactions:    10000
Failed transactions:        0
Longest transaction:        0.06
Shortest transaction:       0.03

liburing based async server

>>> siege -b -c100 -r100 http://0.0.0.0:9009/
Transactions:               10000 hits
Availability:               100.00 %
Elapsed time:               1.87 secs
Data transferred:           1.25 MB
Response time:              0.02 secs
Transaction rate:           5347.59 trans/sec
Throughput:                 0.67 MB/sec
Concurrency:                98.06
Successful transactions:    10000
Failed transactions:        0
Longest transaction:        0.03
Shortest transaction:       0.00

Of course it could be that I didn't use the best code for asyncio :)

This is just to show you that you can get better speed out of liburing.

qweeze commented 3 years ago

I also did some benchmarks and got comparable results for asyncio vs liburing - rps was almost 2 times higher. But still worse than uvloop :) https://github.com/qweeze/py_uring_sandbox/tree/master/benchmarks I'm pretty sure my code for asyncio is ok, but I could misused something in liburing. Also it seems that integrating liburing with asyncio's event loop via eventfd brings very little overhead.

YoSTEALTH commented 3 years ago

I also did some benchmarks and got comparable results for asyncio vs liburing - rps was almost 2 times higher.

Nice, really exiting to see what others have done with liburing

But still worse than uvloop :)

python uvloop is written in cpython, thus a lot faster. Maybe this can be done with liburing as well once its out of prototyping stage.

I'm pretty sure my code for asyncio is ok, but I could misused something in liburing.

The asyncio code you are using is a bit slower you could try raw asyncio echo_server https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py (to keep benchmarking more fair)

I tried to run the uring_server.py but it gives me ERROR:uring_server:Request too large

qweeze commented 3 years ago

Maybe this can be done with liburing

That'd be cool, too bad I have no experience with cython :)

try raw asyncio echo_server

Thanks, will try it as well

ERROR:uring_server:Request too large

Oh, I haven't implemented proper reading yet and hardcoded bytearray size here, increasing it should help

YoSTEALTH commented 3 years ago

increasing it should help

It was one of the things I quickly tried with no luck.

qweeze commented 3 years ago

Hmm that's odd, I couldn't reproduce that. On my machine if add

print(len(buffer), cqe.res, buffer[:cqe.res])

and do curl localhost:8000 I get

256 78 bytearray(b'GET / HTTP/1.1\r\nHost: localhost:8000\r\nUser-Agent: curl/7.68.0\r\nAccept: */*\r\n\r\n')

Tried with with both installed from pypi version and with latest master branch. Could you send your output of the same print statement?

If you're getting ERROR:uring_server:Request too large and the request is smaller than hardcoded size, then cqe.res == len(buffer) somehow evaluates to True in your case, probably something went wrong either with cqe or with buffer (race condition? cqes got mixed up?)

YoSTEALTH commented 3 years ago

I just used a browser to visit 0.0.0.0:8000 Its not a big deal I know you are trying it out, there are a lot of bugs/weird cases that needs to accounted for.