dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.5k stars 4.52k forks source link

Reading compressed encrypted Stream 60x slower on Net6 than Net Framework #83909

Open RobSwDev opened 1 year ago

RobSwDev commented 1 year ago

Description

We're in the process of upgrading from Net Framework 4.72 to Net6. Unfortunately, we've come across one scenario where the performance in Net6 (or Net7) is terrible compared to Net Framework - in some cases taking 50-60 times as long

I was able to hack a simple application together to reproduce something similar (Streams.zip attached). In this app, the slowdown seems to be x180 !!

Using Net 4.0.30319.42000 Writing 500000 lines took 00:00:00.8609345 Reader 300000 lines took 00:00:00.1417714 Using Net 6.0.15 Writing 500000 lines took 00:00:00.4266431 Reader 300000 lines took 00:00:25.5608638 Using Net 7.0.4 Writing 500000 lines took 00:00:00.4134467 Reader 300000 lines took 00:00:23.6623887

Streams.zip

Reproduction Steps

Compile and run the attached project, both under Net472, Net6 and Net7.

Expected behavior

Performance on Net6 should be better than, or at least comparable to, that on Net472. That's what we have seen elsewhere.

Actual behavior

Using Net 4.0.30319.42000 Writing 500000 lines took 00:00:00.8609345 Reader 300000 lines took 00:00:00.1417714 Using Net 6.0.15 Writing 500000 lines took 00:00:00.4266431 Reader 300000 lines took 00:00:25.5608638 Using Net 7.0.4 Writing 500000 lines took 00:00:00.4134467 Reader 300000 lines took 00:00:23.6623887

Regression?

Huge performance regression in Net6/7 compared to NetFramework 4.72

Known Workarounds

None known. I would be very interested to learn any..

Configuration

Windows 10 22H2 19045.2728 Comparing different versions of Net: Net472 vs Net 6.0.15 vs Net 7.0.4 x64 configuration

Other information

No response

Wraith2 commented 1 year ago

I believe this is something which already has an open issue. If i'm right then the reason is that the native code used to do the decompression changed to deal with large fast reads and it negatively impacted small reads. The workaround is to wrap the compression stream in a buffered stream which will buffer using larger reads.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-io-compression See info in area-owners.md if you want to be subscribed.

Issue Details
### Description We're in the process of upgrading from Net Framework 4.72 to Net6. Unfortunately, we've come across one scenario where the performance in Net6 (or Net7) is terrible compared to Net Framework - in some cases taking 50-60 times as long I was able to hack a simple application together to reproduce something similar (Streams.zip attached). In this app, the slowdown seems to be x180 !! > Using Net 4.0.30319.42000 > Writing 500000 lines took 00:00:00.8609345 > Reader 300000 lines took 00:00:00.1417714 > Using Net 6.0.15 > Writing 500000 lines took 00:00:00.4266431 > Reader 300000 lines took 00:00:25.5608638 > Using Net 7.0.4 > Writing 500000 lines took 00:00:00.4134467 > Reader 300000 lines took 00:00:23.6623887 > [Streams.zip](https://github.com/dotnet/runtime/files/11066943/Streams.zip) ### Reproduction Steps Compile and run the attached project, both under Net472, Net6 and Net7. ### Expected behavior Performance on Net6 should be better than, or at least comparable to, that on Net472. That's what we have seen elsewhere. ### Actual behavior > Using Net 4.0.30319.42000 > Writing 500000 lines took 00:00:00.8609345 > Reader 300000 lines took 00:00:00.1417714 > Using Net 6.0.15 > Writing 500000 lines took 00:00:00.4266431 > Reader 300000 lines took 00:00:25.5608638 > Using Net 7.0.4 > Writing 500000 lines took 00:00:00.4134467 > Reader 300000 lines took 00:00:23.6623887 > ### Regression? Huge performance regression in Net6/7 compared to NetFramework 4.72 ### Known Workarounds None known. I would be very interested to learn any.. ### Configuration Windows 10 22H2 19045.2728 Comparing different versions of Net: Net472 vs Net 6.0.15 vs Net 7.0.4 x64 configuration ### Other information _No response_
Author: RobSwDev
Assignees: -
Labels: `area-System.IO.Compression`, `untriaged`, `needs-area-label`
Milestone: -
stephentoub commented 1 year ago

Yes, this is a duplicate of https://github.com/dotnet/runtime/issues/39233.

As @Wraith2 says, a tradeoff was made in the underlying zlib implementation that significantly favors larger reads, and BinaryReader is doing very small reads. I took your repro and turned it into a benchmarkdotnet repro. When I run it locally, I get this:

Method Runtime Mean Error StdDev Median Ratio RatioSD
ReadUnbuffered .NET 7.0 1,457.49 ms 33.427 ms 94.828 ms 1,485.36 ms 6.90 0.71
ReadUnbuffered .NET Framework 4.8 183.03 ms 2.367 ms 2.098 ms 182.96 ms 1.00 0.00
ReadBuffered .NET 7.0 34.20 ms 0.668 ms 0.769 ms 33.96 ms 0.51 0.01
ReadBuffered .NET Framework 4.8 67.11 ms 0.338 ms 0.299 ms 67.02 ms 1.00 0.00

Meaning, I don't see a 60x slowdown, but I do so an ~7x slowdown when tiny reads are performed, which is similar to the previously cited issue. But then when a reasonably-sized reads are performed, in this repro by inserting a BufferedStream between the BinaryReader and the DeflateStream, the .NET 7 version ends up being twice as fast as the .NET Framework 4.8 version. Net net, there's a significant performance benefit when the type is used as intended, even though there's unfortunately a slowdown when it's used in a manner not intended. The workaround as cited is to ensure bigger reads are performed. Note that both .NET Core and .NET Framework do a lot better when larger reads are performed, so it's a good practice, regardless.

@jtkukunas, developers keep running into this, and I suspect it's likely only the tip of the iceberg, with more folks hitting it and just not realizing they're being significantly penalized. Is there anything we can do about it in zlib to tweak that tradeoff? It's a really big penalty to pay, even if we'd prefer / encourage folks to perform larger reads.

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.IO;
using System.IO.Compression;
using System.Security.Cryptography;

public partial class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

    private const int num = 500000;
    private const int numRead = 300000;
    private MemoryStream _ms;
    private Encrypter _encrypter = new Encrypter();

    [GlobalSetup]
    public void Setup()
    {
        _ms = new MemoryStream();
        using (var writeStream = _encrypter.CreateWriter(_ms))
        using (var writer = new BinaryWriter(writeStream))
        {
            for (int i = 0; i < num; i++)
            {
                writer.Write(true);
                writer.Write(i);
                writer.Write($"abcdedcba{i}");
                writer.Write(i);
            }
        }
    }

    [Benchmark]
    public void ReadUnbuffered()
    {
        _ms.Position = 0;
        using (var readStream = _encrypter.CreateReader(_ms))
        using (var reader = new BinaryReader(readStream))
        {
            for (int i = 0; i < numRead; i++)
            {
                Read(reader, i);
            }
        }
    }

    [Benchmark]
    public void ReadBuffered()
    {
        _ms.Position = 0;
        using (var readStream = _encrypter.CreateReader(_ms))
        using (var bufferedStream = new BufferedStream(readStream, 0x1000))
        using (var reader = new BinaryReader(bufferedStream))
        {
            for (int i = 0; i < numRead; i++)
            {
                Read(reader, i);
            }
        }
    }

    static void Read(BinaryReader reader, int expectedValue)
    {
        var b = reader.ReadBoolean();
        b &= reader.ReadInt32() == expectedValue;
        reader.ReadString();
        b &= reader.ReadInt32() == expectedValue;
        if (!b)
            throw new InvalidOperationException();
    }

    public class Encrypter
    {
        private readonly Rijndael alg;

        public Encrypter()
        {
            using (PasswordDeriveBytes passwordDeriveBytes = new PasswordDeriveBytes("To be or not to be", null))
            {
                alg = Rijndael.Create();
                alg.Key = passwordDeriveBytes.GetBytes(32);
                alg.IV = passwordDeriveBytes.GetBytes(16);
            }
        }

        public Stream CreateReader(Stream stream) => CreateStream(stream, true);
        public Stream CreateWriter(Stream stream) => CreateStream(stream, false);

        private Stream CreateStream(Stream stream, bool isRead)
        {
            ICryptoTransform transform = isRead ? this.alg.CreateDecryptor() : this.alg.CreateEncryptor();
            var cryptoStream = new CryptoStream(stream, transform, isRead ? CryptoStreamMode.Read : CryptoStreamMode.Write, leaveOpen: true);
            return new GZipStream(cryptoStream, isRead ? CompressionMode.Decompress : CompressionMode.Compress, leaveOpen: true);
        }
    }
}
RobSwDev commented 1 year ago

Ok - inserting the BufferedStream fixes the issue perfectly, and performance is then significantly better than Net6. Even Net472 performance is better than previous, so this is a change we should already have made. Happy to close the issue.

I'd question the 7x slowdown though: I ran your Benchmark and got the following:

Method Mean Error StdDev
ReadUnbuffered 99.22 ms 0.753 ms 0.704 ms
ReadBuffered 81.93 ms 1.350 ms 1.197 ms
Method Mean Error StdDev
ReadUnbuffered 18,706.96 ms 160.202 ms 142.015 ms
ReadBuffered 26.92 ms 0.298 ms 0.264 ms

Giving a 180x slowdown for unbuffered in Net7 vs Net472. Not sure if it's some quirk of my machine:

BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19045.2728/22H2/2022Update) 11th Gen Intel Core i7-11850H 2.50GHz, 1 CPU, 16 logical and 8 physical cores .NET SDK=7.0.202 [Host] : .NET 7.0.4 (7.0.423.11508), X64 RyuJIT AVX2 DefaultJob : .NET 7.0.4 (7.0.423.11508), X64 RyuJIT AVX2

danmoseley commented 1 year ago

Do we have sufficiently discoverable breaking change doc for this? So some could self diagnose.

jtkukunas commented 1 year ago

@stephentoub I'm evaluating some ideas that might balance things out a bit. I guess we'll see.

In general though, I expect that the folks who actually care about performance aren't going to be doing these small operations anyway.

stephentoub commented 1 year ago

@stephentoub I'm evaluating some ideas that might balance things

Thank you.

I expect that the folks who actually care about performance aren't going to be doing these small operations anyway.

Once they're aware of the issue, maybe. The problem is we've now seen a non-trivial number of real uses from devs reasonably interested in perf, where what they had before was sufficient but this change leads to an order of magnitude slowdown and it's no longer sufficient. I'm concerned about them, and I'm concerned about the likely many, many more folks that don't notice but that end up with much higher costs even so.

talanc commented 1 year ago

I think I ran into the same underlying issue using ZipArchiveEntry.Open and then BinaryReader.

My code in production was taking an abnormally long time using .NET 6. I wrote a quick program to run and observed that .NET 6 took 18 seconds and .NET Framework took 0.5 seconds.

I've created a benchmark that demonstrates the issue: https://raw.githubusercontent.com/talanc/ZipBenchmarks/master/ZipBenchmarks.ZipArchiveEntryBenchmarks-20230624-204220.txt The results here show that .NET 7/8 is 14x slower than .NET Framework.

As discovered above, wrapping the ZipArchiveEntry stream in a BufferedStream fixes the performance regression, and even gives .NET Framework slightly faster results too.

I hope the underlying issue can be fixed. For now BufferedStream provides an easy workaround.

stephentoub commented 1 year ago

I'm evaluating some ideas that might balance things out a bit. I guess we'll see.

@jtkukunas, how did this go?

jtkukunas commented 12 months ago

@stephentoub Not particularly well. At the end of the day, it's just the wrong tradeoff.

Is there any way you can internally buffer things in the intermediate layers?

Wraith2 commented 12 months ago

There is by using a BufferedStream but the users have to be aware that they need to make that change to avoid a performance regression.

Is it possible to change the strategy of keeping the buffer full? At the moment if even a single byte is read the existing buffer will be shifted back and a new byte read in to keep it full. Could it be changed to re-fill the buffer only on a smaller number of course thresholds? perhaps a quarter? that should keep performance for large reads and avoid constant buffer copies for sequences of smaller reads.

jtkukunas commented 11 months ago

Let me rephrase what I'm trying to say. You don't want to use an unbuffered stream here. You want a buffered stream, which will give you better performance.

If someone runs into this problem, it's highlighting an inefficiency in their code. Once they fix it, they'll get even better performance than before.

I'm sure you guys have channels for developer education.

stephentoub commented 11 months ago

If someone runs into this problem, it's highlighting an inefficiency in their code. Once they fix it, they'll get even better performance than before.

The problem is there are millions of developers out there with billions of lines of code. They upgrade to .NET 6, and there's now an inefficiency that gets introduced into their application. Yes, they should be doing it differently, but the only folks that will notice are the ones where it's so egregious that they seek out an answer. My concern is that's the tip of the iceberg, and there's a much larger number of devs / apps that have or will just incur a meaningful performance regression silently, and not realize there was an existing opportunity to do better that is now costing them a whole lot more than it did before.