Open TheLudlows opened 8 months ago
Likely implementation. For example, I have made the explicit decision of keeping a hash of the completion entries instead of just casting their address as unsafe.
That's a decision I don't regret, since early io_uring code was full of issues that essentially led to wrong addresses being added there, completions disappearing, etc. I am sure it's less of an issue now, but I have never done the work to methodically go chase performance issues.
It looks like it's almost 30% worse,is any way to improve it?
Discussed previously in #554. Probably just needs someone to run things under a profiler to figure it out. I know that monoio relies on nightly features (e.g. fast thread local) and that could also be contributing to the performance difference (it doesn't require it anymore but the benchmarks are run against nightly with that feature on).
Would be interesting to hear from @ihciah if he can give a high level guess if he intentionally did something differently with monoio to get higher perf.
Their implementation also uses fastpoll which is much better than what we do which is poll+read if I am remembering all this correctly. Honestly the way to resolve this now is probably to re-do sockets with all the new io_uring features that have become available.. Of which I would include buffer select, buffer rings, fastpoll, and zc where possible. Also worth checking the Semaphore implementation as they mention in #554.
@bryandmc do you know if this impacts disk I/O performance at all or if this is just issues in the net stack?
@vlovich from an io_uring perspective, we already do most (if not all?) of the things that ensure fast disk reads. The additional performance could be obtained through profiling, etc, but unlike the net stack I don't think there are features we have "left on the table" that we aren't currently using.. Because of that, I would defer to @glommer explanation, which is just that it hasn't been optimized at all. Probably some easy performance wins for anyone with a little time and a profiler..
https://github.com/bytedance/monoio/blob/master/docs/en/benchmark.md
As described in the test.