dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.04k stars 4.68k forks source link

[API Proposal]: System.IO.Compression: Exposing Remaining number of Input buffer bytes to support mixed data blocks. #73770

Open jnix-abk opened 2 years ago

jnix-abk commented 2 years ago

Background and motivation

I am working with data that has packed objects that contain header bytes and then a ZLib blob. The DeflateStream class has an internal buffer that advances the BaseStream to fetch data. The problem is that the underlying Stream is then advanced past where the ZLib blob ends. Currently, I have to use reflection to reach down into the object model to get the number of bytes that were not consumed in the buffer so I can re-wind the BaseStream to where the ZLib data ended. Like so:

//Get the Available Bytes in the internal buffer
//Reflection Path = deflateStream -> _inflater / inflater -> _zlibStream -> AvailIn
var flags = BindingFlags.NonPublic | BindingFlags.Instance;
var inflater = deflateStream
    .GetType()
    .GetFields(flags)
    .ToList()
    .Find(x => x.Name == "inflater" || x.Name == "_inflater");
var inflaterInstance = inflater.GetValue(deflateStream);
var zlibStream = inflaterInstance.GetType().GetField("_zlibStream", flags);
var zlibStreamInstance = zlibStream.GetValue(inflaterInstance);
var availIn = zlibStreamInstance.GetType().GetProperty("AvailIn");
var availInValue = availIn.GetValue(zlibStreamInstance);

//Rewind the BaseStream
deflateStream.BaseStream.Seek(-1 * (uint)availInValue, SeekOrigin.Current);

API Proposal

Can you add

        public int AvailableInput => (int)_zlibStream.AvailIn;

below this: https://github.com/dotnet/runtime/blob/a3fb0d383adf98ee2fb2ab816c28735dc1caaba0/src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/Inflater.cs#L43

And then add

        internal int RemainingBufferBytes
        {
            get
            {
                if (_inflater != null)
                    return _inflater.AvailableInput;
                else
                    return -1;
            }
        }

here: https://github.com/dotnet/runtime/blob/a3fb0d383adf98ee2fb2ab816c28735dc1caaba0/src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/DeflateStream.cs#L120

And then add

        public int RemainingBufferBytes { get { throw null; } }

below: https://github.com/dotnet/runtime/blob/a3fb0d383adf98ee2fb2ab816c28735dc1caaba0/src/libraries/System.IO.Compression/ref/System.IO.Compression.cs#L131

Finally add:

        /// <summary>Returns the number of unused Input Buffer bytes.  This can be used to rewind the BaseStream when reading from mixed-use data.</summary>
        public int RemainingBufferBytes
        {
            get
            {
                if (_deflateStream != null)
                    return _deflateStream.RemainingBufferBytes;
                else
                    return -1;
            }
        }

here: https://github.com/dotnet/runtime/blob/d2c991effcdf543cc60632e5588984aa22dd6772/src/libraries/System.IO.Compression/src/System/IO/Compression/ZLibStream.cs#L66

I have added these to a local copy and built the whole thing. It works as expected.

API Usage

//Read the compressed data
ZLibStream inflater = new ZLibStream(mixedUseStream, CompressionMode.Decompress, true);
MemoryStream expandedContents = new MemoryStream();
inflater.CopyTo(expandedContents);

//Rewind the stream by the unused buffer bytes.
mixedUseStream.Seek(-1 * inflater.RemainingBufferBytes, SeekOrigin.Current);

//Close the stream
inflater.Close();

Alternative Designs

No response

Risks

There are no risks as this is simply exposing data to read that is already present.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-io-compression See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation I am working with data that has packed objects that contain header bytes and then a ZLib blob. The DeflateStream class has an internal buffer that advances the BaseStream to fetch data. The problem is that the underlying Stream is then advanced past where the ZLib blob ends. Currently, I have to use reflection to reach down into the object model to get the number of bytes that were not consumed in the buffer so I can re-wind the BaseStream to where the ZLib data ended. Like so: ```csharp //Get the Available Bytes in the internal buffer //Reflection Path = deflateStream -> _inflater / inflater -> _zlibStream -> AvailIn var flags = BindingFlags.NonPublic | BindingFlags.Instance; var inflater = deflateStream .GetType() .GetFields(flags) .ToList() .Find(x => x.Name == "inflater" || x.Name == "_inflater"); var inflaterInstance = inflater.GetValue(deflateStream); var zlibStream = inflaterInstance.GetType().GetField("_zlibStream", flags); var zlibStreamInstance = zlibStream.GetValue(inflaterInstance); var availIn = zlibStreamInstance.GetType().GetProperty("AvailIn"); var availInValue = availIn.GetValue(zlibStreamInstance); //Rewind the BaseStream deflateStream.BaseStream.Seek(-1 * (uint)availInValue, SeekOrigin.Current); ``` ### API Proposal Can you add ```csharp public int AvailableInput => (int)_zlibStream.AvailIn; ``` below this: https://github.com/dotnet/runtime/blob/a3fb0d383adf98ee2fb2ab816c28735dc1caaba0/src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/Inflater.cs#L43 And then add ```csharp internal int RemainingBufferBytes { get { if (_inflater != null) return _inflater.AvailableInput; else return -1; } } ``` here: https://github.com/dotnet/runtime/blob/a3fb0d383adf98ee2fb2ab816c28735dc1caaba0/src/libraries/System.IO.Compression/src/System/IO/Compression/DeflateZLib/DeflateStream.cs#L120 And then add ```csharp public int RemainingBufferBytes { get { throw null; } } ``` below: https://github.com/dotnet/runtime/blob/a3fb0d383adf98ee2fb2ab816c28735dc1caaba0/src/libraries/System.IO.Compression/ref/System.IO.Compression.cs#L131 Finally add: ```csharp /// Returns the number of unused Input Buffer bytes. This can be used to rewind the BaseStream when reading from mixed-use data. public int RemainingBufferBytes { get { if (_deflateStream != null) return _deflateStream.RemainingBufferBytes; else return -1; } } ``` here: https://github.com/dotnet/runtime/blob/d2c991effcdf543cc60632e5588984aa22dd6772/src/libraries/System.IO.Compression/src/System/IO/Compression/ZLibStream.cs#L66 I have added these to a local copy and built the whole thing. It works as expected. ### API Usage ```csharp //Read the compressed data ZLibStream inflater = new ZLibStream(mixedUseStream, CompressionMode.Decompress, true); MemoryStream expandedContents = new MemoryStream(); inflater.CopyTo(expandedContents); //Rewind the stream by the unused buffer bytes. mixedUseStream.Seek(-1 * deflator.RemainingBufferBytes, SeekOrigin.Current); //Close the stream inflater.Close(); ``` ### Alternative Designs _No response_ ### Risks There are no risks as this is simply exposing data to read that is already present.
Author: jnix-abk
Assignees: -
Labels: `api-suggestion`, `area-System.IO.Compression`
Milestone: -
adamsitnik commented 2 years ago

@carlossanlop @Jozkee @stephentoub are there any reasons why we should not expose this data?

jnix-abk commented 2 years ago

Any update on this?

stephentoub commented 2 years ago

@stephentoub are there any reasons why we should not expose this data?

I don't have fundamental objections to this, other than it "feels" strange. If the goal is to avoid DeflateStream/GZipStream/ZLibStream consuming more of the source stream than actually contains the relevant data, could we have the streams seek backwards themselves once they're closed if the source is seekable? Or would this be a better for fit https://github.com/dotnet/runtime/issues/39327 / https://github.com/dotnet/runtime/issues/62113 such that the developer is handling all of the interactions with the source stream themselves and knows exactly how many of the bytes were consumed by the decompression process?

jnix-abk commented 2 years ago

ZLib data does not have an easy size indicator that I could use to grab just the compressed data. If I knew the size, I'd just extract it and shove it in a MemoryStream. That is why I need the above property exposed.

BenMcLean commented 3 months ago

My current workaround for this is to catch and then ignore the EndOfStreamException which is thrown from attempting to read past the end of the stream.