Open RobSwDev opened 1 year ago
I believe this is something which already has an open issue. If i'm right then the reason is that the native code used to do the decompression changed to deal with large fast reads and it negatively impacted small reads. The workaround is to wrap the compression stream in a buffered stream which will buffer using larger reads.
Tagging subscribers to this area: @dotnet/area-system-io-compression See info in area-owners.md if you want to be subscribed.
Author: | RobSwDev |
---|---|
Assignees: | - |
Labels: | `area-System.IO.Compression`, `untriaged`, `needs-area-label` |
Milestone: | - |
Yes, this is a duplicate of https://github.com/dotnet/runtime/issues/39233.
As @Wraith2 says, a tradeoff was made in the underlying zlib implementation that significantly favors larger reads, and BinaryReader is doing very small reads. I took your repro and turned it into a benchmarkdotnet repro. When I run it locally, I get this:
Method | Runtime | Mean | Error | StdDev | Median | Ratio | RatioSD |
---|---|---|---|---|---|---|---|
ReadUnbuffered | .NET 7.0 | 1,457.49 ms | 33.427 ms | 94.828 ms | 1,485.36 ms | 6.90 | 0.71 |
ReadUnbuffered | .NET Framework 4.8 | 183.03 ms | 2.367 ms | 2.098 ms | 182.96 ms | 1.00 | 0.00 |
ReadBuffered | .NET 7.0 | 34.20 ms | 0.668 ms | 0.769 ms | 33.96 ms | 0.51 | 0.01 |
ReadBuffered | .NET Framework 4.8 | 67.11 ms | 0.338 ms | 0.299 ms | 67.02 ms | 1.00 | 0.00 |
Meaning, I don't see a 60x slowdown, but I do so an ~7x slowdown when tiny reads are performed, which is similar to the previously cited issue. But then when a reasonably-sized reads are performed, in this repro by inserting a BufferedStream between the BinaryReader and the DeflateStream, the .NET 7 version ends up being twice as fast as the .NET Framework 4.8 version. Net net, there's a significant performance benefit when the type is used as intended, even though there's unfortunately a slowdown when it's used in a manner not intended. The workaround as cited is to ensure bigger reads are performed. Note that both .NET Core and .NET Framework do a lot better when larger reads are performed, so it's a good practice, regardless.
@jtkukunas, developers keep running into this, and I suspect it's likely only the tip of the iceberg, with more folks hitting it and just not realizing they're being significantly penalized. Is there anything we can do about it in zlib to tweak that tradeoff? It's a really big penalty to pay, even if we'd prefer / encourage folks to perform larger reads.
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.IO;
using System.IO.Compression;
using System.Security.Cryptography;
public partial class Program
{
static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
private const int num = 500000;
private const int numRead = 300000;
private MemoryStream _ms;
private Encrypter _encrypter = new Encrypter();
[GlobalSetup]
public void Setup()
{
_ms = new MemoryStream();
using (var writeStream = _encrypter.CreateWriter(_ms))
using (var writer = new BinaryWriter(writeStream))
{
for (int i = 0; i < num; i++)
{
writer.Write(true);
writer.Write(i);
writer.Write($"abcdedcba{i}");
writer.Write(i);
}
}
}
[Benchmark]
public void ReadUnbuffered()
{
_ms.Position = 0;
using (var readStream = _encrypter.CreateReader(_ms))
using (var reader = new BinaryReader(readStream))
{
for (int i = 0; i < numRead; i++)
{
Read(reader, i);
}
}
}
[Benchmark]
public void ReadBuffered()
{
_ms.Position = 0;
using (var readStream = _encrypter.CreateReader(_ms))
using (var bufferedStream = new BufferedStream(readStream, 0x1000))
using (var reader = new BinaryReader(bufferedStream))
{
for (int i = 0; i < numRead; i++)
{
Read(reader, i);
}
}
}
static void Read(BinaryReader reader, int expectedValue)
{
var b = reader.ReadBoolean();
b &= reader.ReadInt32() == expectedValue;
reader.ReadString();
b &= reader.ReadInt32() == expectedValue;
if (!b)
throw new InvalidOperationException();
}
public class Encrypter
{
private readonly Rijndael alg;
public Encrypter()
{
using (PasswordDeriveBytes passwordDeriveBytes = new PasswordDeriveBytes("To be or not to be", null))
{
alg = Rijndael.Create();
alg.Key = passwordDeriveBytes.GetBytes(32);
alg.IV = passwordDeriveBytes.GetBytes(16);
}
}
public Stream CreateReader(Stream stream) => CreateStream(stream, true);
public Stream CreateWriter(Stream stream) => CreateStream(stream, false);
private Stream CreateStream(Stream stream, bool isRead)
{
ICryptoTransform transform = isRead ? this.alg.CreateDecryptor() : this.alg.CreateEncryptor();
var cryptoStream = new CryptoStream(stream, transform, isRead ? CryptoStreamMode.Read : CryptoStreamMode.Write, leaveOpen: true);
return new GZipStream(cryptoStream, isRead ? CompressionMode.Decompress : CompressionMode.Compress, leaveOpen: true);
}
}
}
Ok - inserting the BufferedStream fixes the issue perfectly, and performance is then significantly better than Net6. Even Net472 performance is better than previous, so this is a change we should already have made. Happy to close the issue.
I'd question the 7x slowdown though: I ran your Benchmark and got the following:
Method | Mean | Error | StdDev |
---|---|---|---|
ReadUnbuffered | 99.22 ms | 0.753 ms | 0.704 ms |
ReadBuffered | 81.93 ms | 1.350 ms | 1.197 ms |
Method | Mean | Error | StdDev |
---|---|---|---|
ReadUnbuffered | 18,706.96 ms | 160.202 ms | 142.015 ms |
ReadBuffered | 26.92 ms | 0.298 ms | 0.264 ms |
Giving a 180x slowdown for unbuffered in Net7 vs Net472. Not sure if it's some quirk of my machine:
BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19045.2728/22H2/2022Update) 11th Gen Intel Core i7-11850H 2.50GHz, 1 CPU, 16 logical and 8 physical cores .NET SDK=7.0.202 [Host] : .NET 7.0.4 (7.0.423.11508), X64 RyuJIT AVX2 DefaultJob : .NET 7.0.4 (7.0.423.11508), X64 RyuJIT AVX2
Do we have sufficiently discoverable breaking change doc for this? So some could self diagnose.
@stephentoub I'm evaluating some ideas that might balance things out a bit. I guess we'll see.
In general though, I expect that the folks who actually care about performance aren't going to be doing these small operations anyway.
@stephentoub I'm evaluating some ideas that might balance things
Thank you.
I expect that the folks who actually care about performance aren't going to be doing these small operations anyway.
Once they're aware of the issue, maybe. The problem is we've now seen a non-trivial number of real uses from devs reasonably interested in perf, where what they had before was sufficient but this change leads to an order of magnitude slowdown and it's no longer sufficient. I'm concerned about them, and I'm concerned about the likely many, many more folks that don't notice but that end up with much higher costs even so.
I think I ran into the same underlying issue using ZipArchiveEntry.Open and then BinaryReader.
My code in production was taking an abnormally long time using .NET 6. I wrote a quick program to run and observed that .NET 6 took 18 seconds and .NET Framework took 0.5 seconds.
I've created a benchmark that demonstrates the issue: https://raw.githubusercontent.com/talanc/ZipBenchmarks/master/ZipBenchmarks.ZipArchiveEntryBenchmarks-20230624-204220.txt The results here show that .NET 7/8 is 14x slower than .NET Framework.
As discovered above, wrapping the ZipArchiveEntry stream in a BufferedStream fixes the performance regression, and even gives .NET Framework slightly faster results too.
I hope the underlying issue can be fixed. For now BufferedStream provides an easy workaround.
I'm evaluating some ideas that might balance things out a bit. I guess we'll see.
@jtkukunas, how did this go?
@stephentoub Not particularly well. At the end of the day, it's just the wrong tradeoff.
Is there any way you can internally buffer things in the intermediate layers?
There is by using a BufferedStream
but the users have to be aware that they need to make that change to avoid a performance regression.
Is it possible to change the strategy of keeping the buffer full? At the moment if even a single byte is read the existing buffer will be shifted back and a new byte read in to keep it full. Could it be changed to re-fill the buffer only on a smaller number of course thresholds? perhaps a quarter? that should keep performance for large reads and avoid constant buffer copies for sequences of smaller reads.
Let me rephrase what I'm trying to say. You don't want to use an unbuffered stream here. You want a buffered stream, which will give you better performance.
If someone runs into this problem, it's highlighting an inefficiency in their code. Once they fix it, they'll get even better performance than before.
I'm sure you guys have channels for developer education.
If someone runs into this problem, it's highlighting an inefficiency in their code. Once they fix it, they'll get even better performance than before.
The problem is there are millions of developers out there with billions of lines of code. They upgrade to .NET 6, and there's now an inefficiency that gets introduced into their application. Yes, they should be doing it differently, but the only folks that will notice are the ones where it's so egregious that they seek out an answer. My concern is that's the tip of the iceberg, and there's a much larger number of devs / apps that have or will just incur a meaningful performance regression silently, and not realize there was an existing opportunity to do better that is now costing them a whole lot more than it did before.
Description
We're in the process of upgrading from Net Framework 4.72 to Net6. Unfortunately, we've come across one scenario where the performance in Net6 (or Net7) is terrible compared to Net Framework - in some cases taking 50-60 times as long
I was able to hack a simple application together to reproduce something similar (Streams.zip attached). In this app, the slowdown seems to be x180 !!
Streams.zip
Reproduction Steps
Compile and run the attached project, both under Net472, Net6 and Net7.
Expected behavior
Performance on Net6 should be better than, or at least comparable to, that on Net472. That's what we have seen elsewhere.
Actual behavior
Regression?
Huge performance regression in Net6/7 compared to NetFramework 4.72
Known Workarounds
None known. I would be very interested to learn any..
Configuration
Windows 10 22H2 19045.2728 Comparing different versions of Net: Net472 vs Net 6.0.15 vs Net 7.0.4 x64 configuration
Other information
No response