Open PJB3005 opened 2 years ago
Tagging subscribers to this area: @dotnet/area-system-io-compression See info in area-owners.md if you want to be subscribed.
Author: | PJB3005 |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.IO.Compression`, `untriaged` |
Milestone: | - |
I have a real-world use case for this also. I recently implemented my own incomplete parser for ZIP archives to use LibDeflate as the decompressor, which got me some nice speedups. It would be nice to be able to use the structure parsing with my own compression libs.
My use cases are that I want to be able to use zip files (because it's a standard format) but with LZMA (significant space savings for my use case) while also being able to instantly dump these blobs into an SQLite DB (while still compressed). Another use case I have is that I want to basically use zip files as an object storage from an API and being able to use the compressed blobs to throw them over the wire directly would be great.
This would hit multiple birds with one stone.
Allows developers to use third-party compression libraries to get support for algorithms like zstd or LZMA themselves.
Having an enum that requires a third-party library to supply that compression algorithm is likely to cause confusion.
At least some compression libraries add a header to the compressed stream - that being the case, if the constructor instead took something like
public interface IZipCompressionStream {
public string CompressionMethod;
public ReadOnlySpan<byte> Header;
public Stream Compress(Stream raw);
public bool TryDecompress(Stream compressed, out Stream raw);
public Stream Decompress(Stream compressed);
}
... this would allow for arbitrary compression methods, including ones not currently envisioned
Having an enum that requires a third-party library to supply that compression algorithm is likely to cause confusion.
It is a lower level API that simply exposes more information about the underlying zip file format. Python also exposes the ZipInfo.compress_type
field in its zipfile
module (but no ability to access the raw stream, AFAICT).
Limiting the enum members to the compression methods supported by .NET today would be an option, which I suppose is closer to what Python does in this regard.
At least some compression libraries add a header to the compressed stream - that being the case, if the constructor instead took something like
Relying on such headers is silly for zip files, since they already have a standardized 2-byte entry field for compression method.
This entire IZipCompressionStream
seems like a very complex solution and does not address the other point (access to raw blobs, although you could probably abuse it to achieve with many silly hoops).
@Clockwork-Muse I think the API should follow the standard (though which of the specified compression methods should be named members of the enum
is up for debate), instead of inventing its own way of specifying the compression method, that may or may not be useful in the future. Or do you have an example where what you're proposing would be useful today?
Relying on such headers is silly for zip files, since they already have a standardized 2-byte entry field for compression method.
Ah, I was not aware that zip itself listed the possible methods, mybad.
@carlossanlop what is your take on this? Would adding such API help to implement algorithms that are currently not supported OOTB?
Thanks for this suggestion, @PJB3005. I'm moving this to Future, but I've also referenced it in #62658 so that we look at it alongside the LZMA and other potential investments during our .NET 8 planning.
Background and motivation
Right now,
ZipArchive
only supports opening entries compressed withStored
,Deflate
andDeflate64
. While there are open issues about adding support for more specified methods such as LZMA, I would like to propose an orthogonal solution to this problem.Allow access to the raw compressed streams in the zip file, and the compression method flag in the entry. This opens up a few possibilities:
I am far from an expert on the zip file format, but from my rudimentary understanding of it, this should be possible?
API Proposal
API Usage
Using third-party decompression streams with
ZipArchive
:Copying compressed blobs between zip files:
Alternative Designs
No response
Risks
No response