dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.59k stars 4.55k forks source link

Increase the default value of PipeOptions.MinimumSegmentSize #43480

Open halter73 opened 3 years ago

halter73 commented 3 years ago

Description

Copying data is a lot faster with larger buffers. And copying data is a large use case for System.IO.Pipelines. For example, in ASP.NET Core, we plan to use PipeWriter to write files to response bodies (https://github.com/dotnet/aspnetcore/pull/24851).

@brporter got us looking at the performance of using Pipes to copy files, and this once again demonstrated how crucial large buffers are. This is why System.IO.Stream's DefaultCopyBufferSize is 81920 bytes.

https://github.com/dotnet/runtime/blob/c788fe1786ffc9295f1c469e506e26ff2ff4f94c/src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs#L30-L33

PipeOptions.DefaultMinimumSegmentSize is only 4096 bytes, and this leads to terrible performance when calling something like the default implementation of WriteAsync, CopyToAsync or anything that calls PipeWriter.GetMemory() or PipeWriter.GetSpan() without a sizeHint.

https://github.com/dotnet/runtime/blob/c788fe1786ffc9295f1c469e506e26ff2ff4f94c/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/PipeOptions.cs#L14

In my testing, copying a 2GB file went from taking 8607ms to 1771ms by increasing PipeOptions.MinimumSegmentSize from 4096 bytes to 655350 bytes. Even with the in-between Stream.DefaultCopyBufferSize value of 81920 bytes the copy time dropped to 2458ms.

You

Configuration

You can find the benchmark app at https://github.com/halter73/PipeTest/blob/master/Program.cs.

F:\dev\halter73\PipeTest [master +3 ~1 -0 !]> dotnet --info
.NET SDK (reflecting any global.json):
 Version:   6.0.100-alpha.1.20472.11
 Commit:    e55929c5a5

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.20231
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   F:\dev\aspnet\AspNetCore\.dotnet\sdk\6.0.100-alpha.1.20472.11\

Host (useful for support):
  Version: 6.0.0-alpha.1.20507.4
  Commit:  4fef87c65e

.NET SDKs installed:
  6.0.100-alpha.1.20472.11 [F:\dev\aspnet\AspNetCore\.dotnet\sdk]

.NET runtimes installed:
  Microsoft.NETCore.App 6.0.0-alpha.1.20468.7 [F:\dev\aspnet\AspNetCore\.dotnet\shared\Microsoft.NETCore.App]

Regression?

No.

Data

Before (4096)

F:\dev\halter73\PipeTest [slice +3 ~1 -0 !]> dotnet run .\test.in .\test.out PipelinesSE
Copying .\test.in to .\test.out with method PipelinesSE...done!

GetTotalAllocatedBytes(true): [117,722,376] bytes
GetTotalMemory(false): [3,353,520] bytes
Executed for 8607ms.

After (655350)

F:\dev\halter73\PipeTest [master +3 ~1 -0 !]> dotnet run .\test.in .\test.out PipelinesSE
Copying .\test.in to .\test.out with method PipelinesSE...done!

GetTotalAllocatedBytes(true): [1,672,544] bytes
GetTotalMemory(false): [1,718,512] bytes
Executed for 1771ms.
halter73 commented 3 years ago
BrennanConroy commented 3 years ago

Triage: Alternatively, we can add an API to pass a memory hint to WriteAsync and CopyToAsync

halter73 commented 3 years ago

We could do both.

davidfowl commented 3 years ago

I don't really understand what we'd change the default to based on this since it's scenario specific. I do think we could change the defaults for CopyToAsync and have a way to pass in a sizehint for CopyToAsync and WriteAsync.

benaadams commented 3 years ago

In my testing, copying a 2GB file went from taking 8607ms to 1771ms by increasing PipeOptions.MinimumSegmentSize from 4096 bytes to 655350 bytes. Even with the in-between Stream.DefaultCopyBufferSize value of 81920 bytes the copy time dropped to 2458ms.

Is that doing multiple smaller async reads from a stream rather than repeated smaller copies?

e.g. if you read to a large chunk (65k); then copy that to the small 4k blocks what's the time difference?

halter73 commented 3 years ago

Is that doing multiple smaller async reads from a stream rather than repeated smaller copies?

e.g. if you read to a large chunk (65k); then copy that to the small 4k blocks what's the time difference?

When experimenting with file copying on Windows, I tested that by repeatedly calling GetMemory()/Advance() without flushing every time in my flush-less branch. That only yielded an ~8% improvement.

Then I created a cheat branch that went even further and Advanced prior to receiving data so it could do concurrent reads into smaller buffers. This was yielded a ~48% improvement which was equivalent to using a larger buffer (where larger buffer size = concurrent reads * smaller buffer size). But since you can only really read into one buffer at a time with a PipeWriter, that's not an option unless we add PipeWriter APIs.

You can see all the branches I experimented with at https://github.com/halter73/PipeTest/branches. I've included my local perf measurements in my commit messages.

benaadams commented 3 years ago

Ah, DefaultMinimumSegmentSize is independent of the Memory pool block size; I was thinking it was increasing that.

Though it would cause by default allocations from the ArrayPool rather than the Memory pool for Kestrel (if the size is not specified)?

halter73 commented 3 years ago

Though it would cause by default allocations from the ArrayPool rather than the Memory pool for Kestrel (if the size is not specified)?

If we do this, Kestrel likely wouldn't use the DefaultMinimumSegmentSize in order to avoid this.

Chacoon3 commented 3 months ago

Thank you guys for discussing this issue.