dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.91k stars 4.63k forks source link

Unexpected and undocumented (performance) pitfall in BufferedStream.WriteByte #104559

Open ANahr opened 2 months ago

ANahr commented 2 months ago

Description

When calling WriteByte on a BufferedStream an implicit Flush will be called on the wrapped stream if the internal buffer is full. This is different from all other Write methods on BufferedStream and is not even mentioned in the documentation. This can be problematic if the wrapped stream or streams have properties that are suspencible to Flushes (see repo steps for an example). The practical situation is even worse because for lots of writer abstractions (e.g. BinaryWriter) it is mostly an implementation specific if Write or WriteByte is used. So in effect it is mostly "random" if Flushes happen when writing to an underlying BufferedStream.

Reproduction Steps

Please consider the following code:

var latencyStream = new SomeStreamWhereCallingWithSmallBuffersOrFlushesIsExtremelyExpensive();
var specialLatencyBufferStream  = new BufferedStream(latencyStream, 1024 * 1024 * 100);
var compressionStream = new DeflateStream(specialLatencyBufferStream, CompressionMode.Compress, true);
var bufferedStream = new BufferedStream(compressionStream, 1024 * 8);

If you are now serializing with a BinaryWriter on bufferedStream it will effectively bypass the specialLatencyBufferStream "randomly" (depending on whether WriteByte or any of the other Write methods are used when surpassing the 8kb bufferedStream)

Expected behavior

Write and WriteByte behave the same and don't Flush

Actual behavior

Write methods don't Flush, WriteByte does

Regression?

no

Known Workarounds

Not calling WriteByte

Risks of change

Applications might depend on the behavior. However given this is not documented and mostly "random" in effect this is very unlikely.

Other information

The reason for the behavior is in BufferedStream.cs:

        private void WriteByteSlow(byte value)
        {
            EnsureNotClosed();

            if (_writePos == 0)
            {
                EnsureCanWrite();
                ClearReadBufferBeforeWrite();
                EnsureBufferAllocated();
            }

            // We should not be flushing here, but only writing to the underlying stream, but previous version flushed, so we keep this.
            if (_writePos >= _bufferSize - 1)
                FlushWrite();

            _buffer![_writePos++] = value;

            Debug.Assert(_writePos < _bufferSize);
        }
dotnet-policy-service[bot] commented 2 months ago

Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.

ANahr commented 2 months ago

I added a PR for the change/fix

adamsitnik commented 1 month ago

Hi @ANahr

As the comment says, we know that it's not the best design choice, but we keep doing that because of backward compatibility (cc @stephentoub who is our architect and may have a different opinion on that):

.NET Framework:

https://github.com/microsoft/referencesource/blob/51cf7850defa8a17d815b4700b67116e3fa283c2/mscorlib/system/io/bufferedstream.cs#L1263

.NET:

https://github.com/dotnet/runtime/blame/e1ab2243b27da4aa8a76372e4384bcd2b5894b3b/src/libraries/System.Private.CoreLib/src/System/IO/BufferedStream.cs#L1169

SomeStreamWhereCallingWithSmallBuffersOrFlushesIsExtremelyExpensive

Could you please provide any real-life examples of Streams for which Flush is very expensive?

ANahr commented 1 month ago

Hi @adamsitnik

Thanks for you comment.

To add some context: We have been hit really hard by this in our real world application (huge web application). The effect for us has been that writing a (several GB) binary file that usually takes some seconds suddenly took more than 6 hours. And the reason just was that some bytes have switched around and WriteByte by chance happend to be at buffer boundries thus destroying other buffers through their flushes.

Unfortunatelly the code there is quite complex and the writing logic alone is thousands of loc, but it all boiled down to the (wrong) Flushes of a BufferedStream.

I'll try a simple example here using DeflateStream (that is much more harmless and obviously not real-world but still shows the issue):

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.IO.Compression;

var summary = BenchmarkRunner.Run<Benchmark>();

public class Benchmark
{
    Stream _bufferedStream = null!;
    byte[] _fixedData = new byte[16];

    [Params(16, 1024*1024)]
    public int BufferSize { get; set; }

    [IterationSetup]
    public void IterationSetup()
    {
        var memory = new MemoryStream();
        var compressionStream = new DeflateStream(memory, CompressionLevel.SmallestSize, true);
        var secondaryBuffer = new BufferedStream(compressionStream, 1024 * 1024);
        _bufferedStream = new BufferedStream(secondaryBuffer, BufferSize);
    }

    [Benchmark]
    public void WriteByte()
    {
        for (int i = 0; i < 1024 * 1024; i++)
        {
            _bufferedStream.WriteByte(0);
        }
    }

    [Benchmark]
    public void WriteArray()
    {
        for (int i = 0; i < 1024 * 1024 / 16; i++)
        {
            _bufferedStream.Write(_fixedData, 0, _fixedData.Length);
        }
    }
}
Method BufferSize Mean Error StdDev
WriteByte 16 68.575 ms 0.8415 ms 0.7460 ms
WriteArray 16 4.699 ms 0.0927 ms 0.1598 ms
WriteByte 1048576 5.190 ms 0.1032 ms 0.1305 ms
WriteArray 1048576 4.229 ms 0.0829 ms 0.1315 ms

So WriteByte and WriteArray are comparably fast until the Flushing effect kicks in and "destroys" the secondary buffer in effect (Bold figure). Considering a BufferedStream is exactly for "buffering" it should/must not harm buffering.

ANahr commented 1 month ago

Please also note that this creates very hard to find perfomance problems, because in lots of cases you will have variable sized binary data and the effect will occur seemingly "random"

stephentoub commented 1 month ago

Can you help me understand the comment "usually takes some seconds suddenly took more than 6 hours" when compared to the BufferSize==16 numbers above? BufferSize==16 is incredibly small for a BufferedStream, and for that the overhead in your numbers is ~15x. But something going from seconds to 6 hours would be overhead anywhere from ~360x to ~21,000x. What is it that your code is doing that this is having such an outsized impact?

As the comment says, we know that it's not the best design choice, but we keep doing that because of backward compatibility (cc @stephentoub who is our architect and may have a different opinion on that)

If we have real scenarios that are significantly impacted, I think taking a change here would be ok, albeit documenting it as a breaking change.

ANahr commented 1 month ago

Our concrete case will look much like the one originally mentioned. We have a 8kb BufferedStream, then a CompressedStream, then a 100MB BufferedStream and then then our implementation. If the cost of the flush itself would completely dominate then that would result in a ~12500x slowdown. We probably have a mixture of that.

More generally I think you will find lots of cases where this will matter (e.g. any case of network stream where flushing MUST trigger physical data send).

Unfornunatelly I cannot provide our implementation (for both size and legal reasons) but if still needed I can see if I can find a github library that also shows more "extreme" effects than just the compressedstream sample above.

ANahr commented 1 month ago

@stephentoub Here is a code that uses a nuget package (https://www.nuget.org/packages/OpenMcdf) that shows a 130x slowdown when using a 8kb primary buffer. And the comparison code is using a stupid variant that writes 1-byte arrays!:

using OpenMcdf;
using OpenMcdf.Extensions;
using System.Diagnostics;
using System.IO.Compression;

var file = new CompoundFile();
var contentStream = file.RootStorage.AddStream("Test").AsIOStream();
var contentStreamBuffer = new BufferedStream(contentStream, 1024 * 1024 * 100);
var compressionStream = new DeflateStream(contentStreamBuffer, CompressionLevel.NoCompression, true);
var bufferedStream = new BufferedStream(compressionStream, 1024 * 8);

var sw = Stopwatch.StartNew();
for (int i = 0; i < 1024 * 1024 * 100; i++)
{
    //bufferedStream.WriteByte((byte)i);      // Comment in this line will take 00:03:15.2714331
    //bufferedStream.Write([(byte)i], 0, 1);  // Comment in this line will take 00:00:01.5074793
}
Console.WriteLine(sw.Elapsed);
stephentoub commented 1 month ago

Thanks.

@adamsitnik, I'd be ok with the change, we just need to call it out as breaking.