alibaba / PhotonLibOS

Probably the fastest coroutine lib in the world!
https://PhotonLibOS.github.io
Apache License 2.0
874 stars 116 forks source link

IOURING and EchoServer fails : free() Invalid Pointer #132

Closed aubi1kenobi closed 1 year ago

aubi1kenobi commented 1 year ago

System: HP, ubuntu 22, kernel 6.2, photon: latest git clone.

Running your sample echo_server.cpp coroutine sample, and exchanging the 'INIT_EVENT_EPOLL' with 'INIT_EVENT_IOURING' fails with the following error:

[INFO] .../io/iouring-wrapper.cpp:486|check_register_file_support:iouring:register_files is enabled free(): invalid pointer I need tis quite urgently, I thought this code was production ready, as per your website.

General comments

Thank you, I need to get to production.

Obi

lihuiba commented 1 year ago

The log "iouring:register_files is enabled" indicates that io_uring is probably initialized ok, and it supports registered files. But the following "free(): invalid pointer" doesn't give us a clue. Can you show us its call stack?

Maybe a simple doc with all the 'gotchas' of using iouring (or better, a specific iouring sample, client socket please (not files and not server))

using iouring is simply a selection of it.

iouring 'settings' facilieies/methods (sqes, poll, callbacks/eventfd, etc.)

they are all encapsulated as the implementation details

More importantly: a low latency setting possibilities (setting logging per default hurts latency, not everyone is throughput oriented).

what do you mean by 「logging per default」?

and we do use photon for low-latency scenarios already.

beef9999 commented 1 year ago

@aubi1kenobi There are some minor changes in the echo server example (examples/perf/net-perf.cpp). Looks like you are not using the latest one. Would you update the code once more? I re-run it in Ubuntu 22 and it was OK.

To use io_uring socket, you can uncomment the lines of auto cli = photon::net::new_iouring_tcp_client(); and auto server = photon::net::new_iouring_tcp_server(); in examples/perf/net-perf.cpp.

The default code will use non-blocking fd + io_uring poll + libc send/recv. However the iouring client/server you triggered on will use io_uring native send/recv + blocking fd.

The event engine should always be io_uring.

You may have noticed that there are two types of networking tests, i.e. streaming and ping-pong. The io_uring server performs well in the ping-pong mode, but the default type of server has better results in streaming mode. That is a known issues we have reported to the io_uring and kernel community.

aubi1kenobi commented 1 year ago

Hi,

No logs generated other than these console msgs:

_``` 100%] Linking CXX executable xPhotonos [100%] Built target xPhotonos ~/xPhotonos/build$ sudo ./xPhotonos 2023/04/24 21:27:05|INFO |th=00005617C4AE75F0|/home/s200330/arc6/shops/PHV/1/photon/io/iouring-wrapper.cpp:653|new_iouring:Init event engine: iouring [is_master=1] 2023/04/24 21:27:05|INFO |th=00005617C4AE75F0|/home/s200330/arc6/shops/PHV/1/photon/io/iouring-wrapper.cpp:486|check_register_file_support:iouring: register_files is enabled free(): invalid pointer Aborted



- Register files: I'm using sockets, therefore was expecting to register buffers, not files
- Settings: There are several things that can be done to set the iorings, just like you are providing the `setsockopt` method to configure the sockets.
- Loging: my bad, you are not loging per default.
- IOuring settings: It's good to provide a simplified facade to the api as you have done. However, just like the "setsockopt", one can do 'multishots' cqe, if desired, register buffers instead of files, set IOPOLL, etc. There is no one size fits all, to abstract that away. If you do then at least have a chine wall implemenation for sockets, low latency, throughput, etc. As you know, throughput and low latency are often, mutually exclusive.

I like your product, and its probably the best api for me to use to get to production, in a low latency env, and network is a latency killer, that's why iouring is key.

The not so good part is that your own sample did not run with iouring, and that causes concerns as you can imagine. If the site said 'experimental' then I wouldn't even post about this, and just go look elsewhere. The selling point for me was/is (if it works) iouring and the comparison you did to libunifex and the likes.

Thanks. And sorry for the long post.

Obi
aubi1kenobi commented 1 year ago

Just reposting part of the previous message, as the settings got a bit shambled.

There is no one size fits all, to abstract that away. If you do then at least have a China wall implemenation for sockets, low latency, throughput, etc. As you know, throughput and low latency are often, mutually exclusive.

I like your product, and its probably the best api for me to use to get to production, in a low latency env (at least, I hope), and network is a latency killer, that's why iouring is key.

The not so good part is that your own sample did not run with iouring, and that causes concerns as you can imagine. If the site said 'experimental' then I wouldn't even post about this, and just go look elsewhere. The selling point for me was/is (if it works) iouring and the comparison you did to libunifex and the likes.

Thanks. And sorry for the long post.

Obi

beef9999 commented 1 year ago

Of course it's not experimental. Alibaba is one of the core contributors of the io_uring community. Even though we don't belong to the kernel team (but the storage), we have had long-time cooperations with the io_uring related kernel teams.

We have also reported many bugs to the kernel. Some of them merged into the mainline. But in order to take that into production earlier, we would back-port these fixes into our own kernel. We are quite confident of the project quality.

beef9999 commented 1 year ago
  1. Register buffer has no performance improvement. This is a common misunderstanding.

https://github.com/axboe/liburing/issues/825#issuecomment-1468527870

  1. About the multi-shot:

You may use the add_interest and rm_interest of the CascadingEventEngine, see some test code test-iouring.cpp, TEST(event_engine, cascading_one_shot)

aubi1kenobi commented 1 year ago

@aubi1kenobi There are some minor changes in the echo server example (examples/perf/net-perf.cpp). Looks like you are not using the latest one. Would you update the code once more? I re-run it in Ubuntu 22 and it was OK. @beef9999, Hi, thanks, I believe I cloned the repo about a week or two, and I ran the c++coro echo server, not the net-perf. Does that make a difference? Anyway, I will try and get back to you.

The default code will use non-blocking fd + io_uring poll + libc send/recv. However the iouring client/server you triggered on will use io_uring native send/recv + blocking fd.

The event engine should always be io_uring.

You may have noticed that there are two types of networking tests, i.e. streaming and ping-pong. The io_uring server performs well in the ping-pong mode, but the default type of server has better results in streaming mode. That is a known issues we have reported to the io_uring and kernel community.

I didn't get this.

beef9999 commented 1 year ago

You may go to the front page and click this button. It describes the differences of streaming and ping-pong.

image


The event engine allows the coroutine to sleep, to block, to schedule, or to poll file descriptors. As long as your kernel satisfies, you should always use the io_uring event engine, which is set by photon::init. The alternative is epoll. It can poll any fd as well.

In terms of io_uring, there are two types of tcp socket implementations. For instance, the client, the first one is photon::net::new_tcp_socket_client (default one) , and the other is photon::net::new_iouring_tcp_client.

The former equals to non-blocking fd + io_uring/epoll poll + libc send/recv. The latter equals to io_uring send/recv + blocking fd. Because io_uring in the new kernel has enabled the FAST_POLL feature by default, so we don't need to poll any more.

The former is good at working in streaming network. The latter is the best choice in case of ping-pong and huge number of connections.

beef9999 commented 1 year ago

Remember to add -D ENABLE_URING=1 when building with CMake https://github.com/alibaba/PhotonLibOS#3-examples--testing

lihuiba commented 1 year ago

When using new_tcp_socket_client (the default one), io_uring is used as an event poller (similar to epoll).

When using new_iouring_tcp_client, io_uring is used to perform async sending and receiving.

aubi1kenobi commented 1 year ago

Hi guys,

I recloned the photon and liburing yesterday. Every io_uring 'test-x' reported that I did not have enough mlock resources and should run 'ulimit -l unlimited', but that doesn't work for a normal user. Running the tests as 'root' fails with: **test-iouring/client/server/x : liburing.so.2: version LIBURING_2.2 not found**

Being a latest update, my iouring is now at 2.4. Can't think that downgrading is the solution. Could you please advise? I seem to be stuck at using iouring.

Thanks. Obi

beef9999 commented 1 year ago

the current cmake logic will not download liburing source code if you have installed it systemwide. you can try to delete the installation temperialiy

aubi1kenobi commented 1 year ago

Hi guys, you guys seem to have lots of experience in the tcp area. In your opinion, will the latency gains, if any, made on using iouring be considerable, in comparison to libaio? Or is it marginal? I know, the linux kernel (or any other sw kernel, will always be the bottleneck), any suggestions with your package for achieving the lowest possible latency? I'm stuck with sw on this project, no fpga., and iouring seems difficult to get right, as all packages i've used show, and now, unfortunately, no luck yet with yours as well.

Obi

beef9999 commented 1 year ago

Could you please quantify your problem, and specify the code?

We don’t know how to proceed this conversion without them

lihuiba commented 1 year ago

any suggestions with your package for achieving the lowest possible latency?

You may want to checkout SMC-R . It is RDMA wrapped as stream socket with a set of APIs similar to TCP. We can release a wrapper of it very soon, if you are interested.

aubi1kenobi commented 1 year ago

@beef9999 Cheers, uninstall of liburing + recompil of your package seem to have solved all the issues I've had so far with iouring. All tests ran ok.

@lihuiba Cheers, had a quick look at the provided link and it really does sound good. Yes I'm interested, quite so indeed. What do you reckon the ETA is for your wrapper?

beef9999 commented 1 year ago

@aubi1kenobi

Photon 0.6 is still under evaluation. This is a pre-release Pull Request and you may check out its code.

As long as your kernel version is greater than 4.x and you set up the photon SMC-R socket wrapper by new_smc_socket_client and new_smc_socket_server, you code will simply work.

Of course a RDMA NIC is required.

aubi1kenobi commented 1 year ago

@beef9999 Is the Solarflare/Xininx/AMD X255 an RDMA NIC? Or do you have a suggestion? Btw, once I solved the initial issue, txs to you and @lihuiba , your package has been satisfactory. I leave the latency improvement to last. Need to get to the end first. I have one or two more questions, but I will close this issue, and open a new one for these other questions. Txs a bunch for your support.