dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.96k stars 4.65k forks source link

HttpClient deflate decompression error #57604

Closed FeeSaver closed 3 years ago

FeeSaver commented 3 years ago

Hi, I just tried to move project from Net 5 to latest preview of Net 6 and I have noticed issue with automatic decompression. Gzip and Brotli seems to work fine, but Deflate is throwing IO error with invalid stream data. When I change back to Net 5, everything works fine again. Here is part of code that throws the error on last line with stream.ReadAsync. I think this happen only when the website is using https and I receive response compressed with Deflate, also the HttpCompletionOption.ResponseHeadersRead might be causing some issue.

var _httpClientHandler = new SocketsHttpHandler
{
    AutomaticDecompression = DecompressionMethods.All
};
var _client = new HttpClient(_httpClientHandler, false);    
var response = await _client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead,)
if (response.IsSuccessStatusCode)
{
    Stream contentStream = await response.Content.ReadAsStreamAsync(_token);
    if (contentStream != Stream.Null)
    {
        int readBytes = 0;
        var rentedBuffer = pool.Rent(64 * 1024);
        var bufferMemory = new Memory<byte>(rentedBuffer, 0, rentedBuffer.Length);
        int readChunk = await stream.ReadAsync(bufferMemory.Slice(readBytes));
    }
}
System.IO.InvalidDataException: The archive entry was compressed using an unsupported compression method.
   at ErrorCode System.IO.Compression.Inflater.Inflate(FlushCode flushCode)
   at ErrorCode System.IO.Compression.Inflater.ReadInflateOutput(Byte* bufPtr, int length, FlushCode flushCode, out int bytesRead)
   at void System.IO.Compression.Inflater.ReadOutput(Byte* bufPtr, int length, out int bytesRead)
   at int System.IO.Compression.Inflater.InflateVerified(Byte* bufPtr, int length)
   at async ValueTask<int> System.IO.Compression.DeflateStream.ReadAsyncMemory(Memory<byte> buffer, CancellationToken cancellationToken)+Core(?)
KalleOlaviNiemitalo commented 3 years ago

Might be caused by https://github.com/dotnet/runtime/pull/42717. If I understand correctly, .NET used to treat Content-Encoding: deflate as only the RFC 1951 "deflate" compression, but RFC 2616 says it requires also the RFC 1950 zlib header. This was fixed in .NET, but now if the HTTP server violates RFC 2616 by omitting the zlib header, it won't work.

FeeSaver commented 3 years ago

Is there any planned workaround? There are website apis that work for me using any browser or basically anything other than Net6 app. Or just half of websites using deflate wont work from now on? I have seen this issue on many application that use NodeJS. I looked at what they use and they have 2 versions of deflate available for almost 10 years.

Class: zlib.Deflate# Added in: v0.5.8 Compress data using deflate.

Class: zlib.DeflateRaw# Added in: v0.5.8 Compress data using deflate, and do not append a zlib header.

https://nodejs.org/api/zlib.html#zlib_class_zlib_deflate

KalleOlaviNiemitalo commented 3 years ago

.NET has DeflateStream and ZlibStream. I imagine you could work around the issue by disabling automatic decompression, which now uses ZlibStream for "deflate", and instead adding the Accept-Encoding header field in your app and using DeflateStream if the server chooses "deflate". Or just do not request "deflate" from the server.

KalleOlaviNiemitalo commented 3 years ago

If there is no code fix done to automatically detect noncompliant servers and decode the plain DEFLATE without zlib, I think this should be documented as a breaking change.

FeeSaver commented 3 years ago

I am hoping for some fix really, .Net deflate worked well "incorrectly" for years, but atleast it worked. Now it is correct to latest standard, but not useable. I guess deflate is used mostly on older servers, the newer servers that maybe support the newer standard use gzip or brotli anyway. My problem is that some sites just return deflate content as default whether I set it as accepted encoding or not, or maybe .Net add the deflate as default acepted encoding, I didnt check. I guess they just assume its suppored by everything. I hope this gets changed in future, I will keep using Net5 for now.

ghost commented 3 years ago

Tagging subscribers to this area: @dotnet/ncl See info in area-owners.md if you want to be subscribed.

Issue Details
Hi, I just tried to move project from Net 5 to latest preview of Net 6 and I have noticed issue with automatic decompression. Gzip and Brotli seems to work fine, but Deflate is throwing IO error with invalid stream data. When I change back to Net 5, everything works fine again. Here is part of code that throws the error on last line with stream.ReadAsync. I think this happen only when the website is using https and I receive response compressed with Deflate, also the HttpCompletionOption.ResponseHeadersRead might be causing some issue. ` var _httpClientHandler = new SocketsHttpHandler { AutomaticDecompression = DecompressionMethods.All }; var _client = new HttpClient(_httpClientHandler, false); var response = await _client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead,) if (response.IsSuccessStatusCode) { Stream contentStream = await response.Content.ReadAsStreamAsync(_token); if (contentStream != Stream.Null) { int readBytes = 0; var rentedBuffer = pool.Rent(64 * 1024); var bufferMemory = new Memory(rentedBuffer, 0, rentedBuffer.Length); int readChunk = await stream.ReadAsync(bufferMemory.Slice(readBytes)); } }` `System.IO.InvalidDataException: The archive entry was compressed using an unsupported compression method. at ErrorCode System.IO.Compression.Inflater.Inflate(FlushCode flushCode) at ErrorCode System.IO.Compression.Inflater.ReadInflateOutput(Byte* bufPtr, int length, FlushCode flushCode, out int bytesRead) at void System.IO.Compression.Inflater.ReadOutput(Byte* bufPtr, int length, out int bytesRead) at int System.IO.Compression.Inflater.InflateVerified(Byte* bufPtr, int length) at async ValueTask System.IO.Compression.DeflateStream.ReadAsyncMemory(Memory buffer, CancellationToken cancellationToken)+Core(?)`
Author: FeeSaver
Assignees: -
Labels: `area-System.Net.Http`, `untriaged`
Milestone: -
stephentoub commented 3 years ago

@karelz, this is the issue we were discussing offline.

KalleOlaviNiemitalo commented 3 years ago

Are you planning an AppContext switch?

karelz commented 3 years ago

Triage: We don't understand what kind of servers used Deflate the "wrong" way. To our best knowledge ASP.NET does not support it by default. @FeeSaver @KalleOlaviNiemitalo what are the servers you run against?

We should not revert this behavior as we are now RFC compliant. We were NOT compliant before. Ideally we want automatic detection -- @stephentoub is looking into that. Global AppContext switch is problematic when you are talking with 2 different servers - one bad and one good.

KalleOlaviNiemitalo commented 3 years ago

@karelz I don't run this against any servers. This issue is just interesting to me because I have hit the same problem years ago in an entirely different HTTP client.

karelz commented 3 years ago

@KalleOlaviNiemitalo so you basically ran years ago into the original bug due to our incompatibility with non-.NET servers, which is why we fixed it, breaking people with .NET-to-.NET environments ...

KalleOlaviNiemitalo commented 3 years ago

@karelz the HTTP client was unrelated to .NET, and I don't know whether the HTTP server used .NET either.

stephentoub commented 3 years ago

Ideally we want automatic detection

Turns out this isn't possible. The zlib library enables automatic detection between gzip and zlib, but not between zlib and deflate. And based on the zlib and deflate file formats, that's likely because it looks impossible to perfectly distinguish between them, which would mean we couldn't accurately do it ourselves even if we wanted to. We could employ a heuristic, and opt-in to zlib for the most common header sequence, but that's obviously flawed.

karelz commented 3 years ago

Thanks @stephentoub! That leaves us with opt-in switch option -- ideally per Handler/Client. However, it won't help with 3rd party libraries (e.g. AzureSDK). So, we will likely need AppContext switch anyway :( ... perhaps we should start with simple AppContext switch and then see later (e.g. in servicing) if we need per-Client/Handler toggle?

Any other thoughts @geoffkizer?

stephentoub commented 3 years ago

Any other thoughts

Last option would be to revert for .NET 6 and figure something else out for .NET 7 (don't know what that would be).

KalleOlaviNiemitalo commented 3 years ago

Maybe there could also be another AppContext switch to disable adding Accept-Encoding: deflate to requests, while letting DecompressionMethods.Deflate still control whether a "deflate" response is decompressed if received. That would reduce the number of "deflate" responses, and hopefully all the remaining ones would then use the same compliant or noncompliant format, so that they could be covered by the first AppContext switch.

FeeSaver commented 3 years ago

Triage: We don't understand what kind of servers used Deflate the "wrong" way. To our best knowledge ASP.NET does not support it by default. @FeeSaver @KalleOlaviNiemitalo what are the servers you run against?

We should not revert this behavior as we are now RFC compliant. We were NOT compliant before. Ideally we want automatic detection -- @stephentoub is looking into that. Global AppContext switch is problematic when you are talking with 2 different servers - one bad and one good.

I think most sites with this problem are older NodeJS or Apache servers. https://web.xmrpool.eu:8119/stats For example this link doesnt work, but it works in Chrome. Here is maybe hint how they do it, I hope the link is working. https://source.chromium.org/chromium/chromium/src/+/main:third_party/devtools-frontend/src/node_modules/node-fetch/lib/index.es.js;drc=5f6c0652cc712b5959f42b9c1745c8e6635f80c3;l=1593

karelz commented 3 years ago

It is interesting idea by @KalleOlaviNiemitalo to provide AppContext switch to disable Deflate as accepted encoding. That may help with app which need to work against multiple endpoints, mixing .NET-buggy Deflate with RFC-compliant Deflate endpoints.

KalleOlaviNiemitalo commented 3 years ago

Could call the switches System.Net.Http.SocketsHttpHandler.DeflateWithoutZlibHeader and System.Net.Http.SocketsHttpHandler.DoNotRequestDeflate. The first one could default to true in runtime rollforward from .NET 5 or lower.

KalleOlaviNiemitalo commented 3 years ago

Here is maybe hint how they do it, I hope the link is working.

Also on GitHub https://github.com/node-fetch/node-fetch/blob/51861e98a8f87e0905e71bb101b506f9512a9d7f/src/index.js#L257-L270

Which refers to http://stackoverflow.com/questions/37519828

stephentoub commented 3 years ago

This is the heuristic I was investigating doing before I convinced myself it wasn't sufficient. But thinking about it further, I realize the error in my logic. Iwas comparing the protocols in general, and in general, that's not a sufficient heuristic, e.g. ZLibStream itself couldn't do it, because the first nibble indicates the compression algorithm and it could be anything. But in the context of http, the response is limited to a zlib wrapper around deflate, which means in that context the first nibble will be 8.

I'll see if I can make this work later today.

karelz commented 3 years ago

Reopening to track port to 6.0

karelz commented 3 years ago

Fixed for 6.0 RC1 in PR #57940 (will flow automatically into release/6.0 branch). Fixed for 7.0 in PR #57862.