Open adamsitnik opened 3 years ago
Tagging subscribers to this area: @carlossanlop See info in area-owners.md if you want to be subscribed.
Author: | adamsitnik |
---|---|
Assignees: | - |
Labels: | `area-System.IO`, `tenet-performance`, `up-for-grabs` |
Milestone: | Future |
@tmds @damageboy @benaadams would any of you be interested?
Many of the io_uring benchmarks are performed on a single thread that needs no synchronization. We won't be able to achieve the gains measured there because we need to synchronize and hop between threads. That will definitely cost us something.
Functionally, using io_uring allows to cancel the the on-going operations. This is not supported with the current sync-on-ThreadPool implementation. So this is a functional gain.
I'll let you know if I find time to work on this. I'd need your, and others, help to optimize the thread/synchronization stuff.
it could detect the kernel version and just use the new strategy for newer kernels (5.5+)
I thought in our last conversation we decided that we should gate this feature on 5.10 since the support between 5.5 and 5.7 is patchy. It seems like 5.10 would be great. As context, .NET 6 container images use Debian 11 by default and the second most popular are Alpine, which for .NET 6 will be 3.13+.
Here's what I found on kernel versions.
Interesting context: https://news.ycombinator.com/item?id=27382299
WSL is already 5.4.72-microsoft-standard-WSL2
@adamsitnik What is io_uring
? Is it a new algorithm for IO?
What is io_uring? Is it a new algorithm for IO?
@adamsitnik What is
io_uring
? Is it a new algorithm for IO?
io_uring is a pretty sweet, modern io api in Linux kernel 5.1+ https://kernel.dk/io_uring.pdf
Uses producer-consumer ring buffers to achieve lock-free asynchrony with low-latency, high throughput, and minimal memory copies, like other notable recent architectures: https://lmax-exchange.github.io/disruptor/files/Disruptor-1.0.pdf https://youtu.be/Qho1QNbXBso?t=1267
Windows also should receive IO Rings API soon: https://windows-internals.com/i-o-rings-when-one-i-o-operation-is-not-enough
https://www.phoronix.com/scan.php?page=news_item&px=8M-IOPS-Per-Core-Linux
An engineer from Facebook is pushing the performance quite aggressively for IO_uring in Linux. So I imagine there would be significant performance gains to be had if utilized in dotnet for Linux.
I wonder what it would take to achieve performance gains with io_uring on non-benchmark workloads. If IOs are issued just like before except using a new call mechanism, I do not see why this would be much faster.
Achieving batching benefits would take new APIs that are not currently available with FileStream
. Registering buffers might be difficult to achieve without application cooperation. I understand that io_uring supports polling which helps with super-low latency devices. That can't be done by default so it must be opt-in.
On the web, there are various reports by people who couldn't reproduce performance gains. This is further evidence that the gains might accrue only when the application is structured suitably.
So maybe it takes new, specialized APIs for applications to harness this fully. Since Windows appears to have similar mechanisms now, there could be a common abstraction for both.
Low latency IO has been a trend for the last couple of years. We have SSDs now that are insanely fast. Networks have become much lower latency as well (e.g. RDMA). So maybe there's value in addressing such devices with a new API.
io_uring
gains come from being able to batch operations, and to batch retrieving result. The benchmarks/apps that benefit from it most will be written so they inherently batch.
All existing .NET APIs are not batching. For example, they deal with each Socket
separately. Making them use io_uring
means adding an additional layer that causes the operations to be batched. The cost of that layer will be significant compared to the benchmarks/apps that inherently batch.
Another improvement:
~500K IOPS/core improvement or around a 5~6% efficiency upgrade
https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.17-Will-Continue-IO
I imagine a whole new set of APIs might be needed, or maybe FileStream
uses it under the hood in high load scenarios. I think the idea is that if you implement the base somewhere, then it will slowly be added to the rest of .NET and optimized.
Is there any news on this topic? I think FileStream
(at least without substantial refactorings) doesn't seem to be the right API for this as it does not support batching.
Isn't that what RandomAccess
suppose to give?
Is there any news on this topic?
We are not planning to add io_uring support for .NET 7. The main reason for that is currently in most common scenarios we would observe a perf regression. Currently in io_uring the producer and consumer (the thread that adds and removes work items to/from the ring) needs to be the same thread. It just does not work well with our current Thread Pool model.
Isn't that what RandomAccess suppose to give?
@ayende is right, RandomAccess
supports passing multiple buffers:
https://devblogs.microsoft.com/dotnet/file-io-improvements-in-dotnet-6/#scatter-gather-io
May be FileStream is bad place for io_uring? Queue rings can be implemented in software level by System.Threading.Tasks.Dataflow primitives, and I think that io_uring place in .NET must be in separate async only primitive.
We have recently invested a lot of time in rewriting
FileStream
on Windows. We have keptio_uring
in mind and after recent refactoring, it should be now much easier to implement the support:FileStream
API.FileStream
can choose the strategy at runtime. In the case ofLinux
, it could detect the kernel version and just use the new strategy for newer kernels (5.5+). It means that the day our customers update their kernel version, .NET could start usingio_uring
without a .NET update.IoUringStrategy
) don't need to worry about buffering at all https://github.com/dotnet/runtime/blob/2223babdd49118787c675e04aff711f936a10b26/src/libraries/System.Private.CoreLib/src/System/IO/Strategies/FileStreamHelpers.Windows.cs#L59-L60IoUringStrategy
would only need to implementReadAsync
andWriteAsync
support.We (owners of System.IO) have a lot of other high-priority things on our schedule for .NET 6 (like full symbolic links support) and since most of our customers are not using the latest Linux kernels, we are most probably won't be able to implement it on our own for .NET 6. But we would love to provide any help necessary (code reviews, testing) for a contributor that would be willing to implement it. Having said that, I am marking this issue as "up-for-grabs".
If we won't find a contributor for .NET 6, we are going to include this in .NET 7 planning and deliver it in .NET 7.