dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.02k stars 4.67k forks source link

Implement io_uring support for FileStream #51985

Open adamsitnik opened 3 years ago

adamsitnik commented 3 years ago

We have recently invested a lot of time in rewriting FileStream on Windows. We have kept io_uring in mind and after recent refactoring, it should be now much easier to implement the support:

We (owners of System.IO) have a lot of other high-priority things on our schedule for .NET 6 (like full symbolic links support) and since most of our customers are not using the latest Linux kernels, we are most probably won't be able to implement it on our own for .NET 6. But we would love to provide any help necessary (code reviews, testing) for a contributor that would be willing to implement it. Having said that, I am marking this issue as "up-for-grabs".

If we won't find a contributor for .NET 6, we are going to include this in .NET 7 planning and deliver it in .NET 7.

ghost commented 3 years ago

Tagging subscribers to this area: @carlossanlop See info in area-owners.md if you want to be subscribed.

Issue Details
We have recently invested a lot of time in rewriting `FileStream` on Windows. We have kept `io_uring` in mind and after recent refactoring, it should be now much easier to implement the support: - we have introduced a new internal abstraction called [FileStreamStrategy](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/IO/Strategies/FileStreamStrategy.cs). It's more or less `FileStream` API. - `FileStream` can choose the strategy at runtime. In the case of `Linux`, it could detect the kernel version and just use the new strategy for newer kernels (5.5+). It means that the day our customers update their kernel version, .NET could start using `io_uring` without a .NET update. - Entire buffering logic has been moved to a new strategy called [BufferedFileStreamStrategy](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/IO/Strategies/BufferedFileStreamStrategy.cs) which can be used as a wrapper over another strategy. It means that new strategies (like `IoUringStrategy`) don't need to worry about buffering at all https://github.com/dotnet/runtime/blob/2223babdd49118787c675e04aff711f936a10b26/src/libraries/System.Private.CoreLib/src/System/IO/Strategies/FileStreamHelpers.Windows.cs#L59-L60 - We can use the existing Unix strategy for sync file IO, so the new `IoUringStrategy` would only need to implement `ReadAsync` and `WriteAsync` support. We (owners of System.IO) have a lot of other high-priority things on our schedule for .NET 6 (like full symbolic links support) and since most of our customers are not using the latest Linux kernels, we are most probably won't be able to implement it on our own for .NET 6. But we would love to provide any help necessary (code reviews, testing) for a contributor that would be willing to implement it. Having said that, I am marking this issue as "up-for-grabs". If we won't find a contributor for .NET 6, we are going to include this in .NET 7 planning and deliver it in .NET 7.
Author: adamsitnik
Assignees: -
Labels: `area-System.IO`, `tenet-performance`, `up-for-grabs`
Milestone: Future
adamsitnik commented 3 years ago

@tmds @damageboy @benaadams would any of you be interested?

tmds commented 3 years ago

Many of the io_uring benchmarks are performed on a single thread that needs no synchronization. We won't be able to achieve the gains measured there because we need to synchronize and hop between threads. That will definitely cost us something.

Functionally, using io_uring allows to cancel the the on-going operations. This is not supported with the current sync-on-ThreadPool implementation. So this is a functional gain.

I'll let you know if I find time to work on this. I'd need your, and others, help to optimize the thread/synchronization stuff.

richlander commented 3 years ago

it could detect the kernel version and just use the new strategy for newer kernels (5.5+)

I thought in our last conversation we decided that we should gate this feature on 5.10 since the support between 5.5 and 5.7 is patchy. It seems like 5.10 would be great. As context, .NET 6 container images use Debian 11 by default and the second most popular are Alpine, which for .NET 6 will be 3.13+.

Here's what I found on kernel versions.

Interesting context: https://news.ycombinator.com/item?id=27382299

omariom commented 3 years ago

WSL is already 5.4.72-microsoft-standard-WSL2

ayousuf23 commented 3 years ago

@adamsitnik What is io_uring? Is it a new algorithm for IO?

stephentoub commented 3 years ago

What is io_uring? Is it a new algorithm for IO?

https://en.wikipedia.org/wiki/Io_uring

davidvmckay commented 3 years ago

@adamsitnik What is io_uring? Is it a new algorithm for IO?

io_uring is a pretty sweet, modern io api in Linux kernel 5.1+ https://kernel.dk/io_uring.pdf

Uses producer-consumer ring buffers to achieve lock-free asynchrony with low-latency, high throughput, and minimal memory copies, like other notable recent architectures: https://lmax-exchange.github.io/disruptor/files/Disruptor-1.0.pdf https://youtu.be/Qho1QNbXBso?t=1267

dmitriyse commented 3 years ago

Windows also should receive IO Rings API soon: https://windows-internals.com/i-o-rings-when-one-i-o-operation-is-not-enough

elachlan commented 2 years ago

https://www.phoronix.com/scan.php?page=news_item&px=8M-IOPS-Per-Core-Linux

An engineer from Facebook is pushing the performance quite aggressively for IO_uring in Linux. So I imagine there would be significant performance gains to be had if utilized in dotnet for Linux.

GSPP commented 2 years ago

I wonder what it would take to achieve performance gains with io_uring on non-benchmark workloads. If IOs are issued just like before except using a new call mechanism, I do not see why this would be much faster.

Achieving batching benefits would take new APIs that are not currently available with FileStream. Registering buffers might be difficult to achieve without application cooperation. I understand that io_uring supports polling which helps with super-low latency devices. That can't be done by default so it must be opt-in.

On the web, there are various reports by people who couldn't reproduce performance gains. This is further evidence that the gains might accrue only when the application is structured suitably.

So maybe it takes new, specialized APIs for applications to harness this fully. Since Windows appears to have similar mechanisms now, there could be a common abstraction for both.

Low latency IO has been a trend for the last couple of years. We have SSDs now that are insanely fast. Networks have become much lower latency as well (e.g. RDMA). So maybe there's value in addressing such devices with a new API.

tmds commented 2 years ago

io_uring gains come from being able to batch operations, and to batch retrieving result. The benchmarks/apps that benefit from it most will be written so they inherently batch.

All existing .NET APIs are not batching. For example, they deal with each Socket separately. Making them use io_uring means adding an additional layer that causes the operations to be batched. The cost of that layer will be significant compared to the benchmarks/apps that inherently batch.

elachlan commented 2 years ago

Another improvement: ~500K IOPS/core improvement or around a 5~6% efficiency upgrade https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.17-Will-Continue-IO

I imagine a whole new set of APIs might be needed, or maybe FileStream uses it under the hood in high load scenarios. I think the idea is that if you implement the base somewhere, then it will slowly be added to the rest of .NET and optimized.

pr8x commented 2 years ago

Is there any news on this topic? I think FileStream (at least without substantial refactorings) doesn't seem to be the right API for this as it does not support batching.

ayende commented 2 years ago

Isn't that what RandomAccess suppose to give?

adamsitnik commented 2 years ago

Is there any news on this topic?

We are not planning to add io_uring support for .NET 7. The main reason for that is currently in most common scenarios we would observe a perf regression. Currently in io_uring the producer and consumer (the thread that adds and removes work items to/from the ring) needs to be the same thread. It just does not work well with our current Thread Pool model.

Isn't that what RandomAccess suppose to give?

@ayende is right, RandomAccess supports passing multiple buffers:

https://devblogs.microsoft.com/dotnet/file-io-improvements-in-dotnet-6/#scatter-gather-io

AlexeiScherbakov commented 11 months ago

May be FileStream is bad place for io_uring? Queue rings can be implemented in software level by System.Threading.Tasks.Dataflow primitives, and I think that io_uring place in .NET must be in separate async only primitive.