Open edwardneal opened 3 months ago
Tagging subscribers to this area: @dotnet/area-system-io-compression See info in area-owners.md if you want to be subscribed.
Why does this matter if the value was completely ignored by the underlying implementation? Seems like a lot of effort just to flow an enum to pass a test. What's is the material side effect of not having this? What's broken?
It's definitely a corner case, but it's not solely to pass a test. This is the API proposal needed to fix two bugs, both of which are regressions between .NET Framework and .NET:
CompressionLevel.Fast
.ZipArchiveEntry.CompressionLevel
property so that I can fix this.only .NET Framework actually returns the real value
But it's not actually a real value. It's just a persisted value. The implementation never honored it.
cc @carlossanlop since he was helping with https://github.com/dotnet/runtime/pull/98278
At the end of the day I'm not opposed to making things look more compatible - just making sure that we spend our energy in the places that matter most. @edwardneal if you feel this is important and a good use of time we can help you get a proposal reviewed.
Thanks @ericstj. It's not preventing me from porting of anything from .NET Framework to .NET - although the constant value returned by PackagePart.CompressionOption is a bug, it's one which has been around since the original code was ported in 2015. I think it'll be slightly more noticeable now that the CompressionOption we specify when adding a PackagePart is mapped through to the correct bits in the ZIP entry.
If time's particularly short with .NET 9's release schedule then it can probably wait for .NET 10, or any "washup" API reviews. Given the choice of time from someone in the Packaging area, I'd personally prioritise #97898 over this - it fixes a bug which completely prevents .NET applications from reading valid XPS packages, and which seems to be blocking a real-world application.
This issue straddles System.IO.Compression and System.IO.Packaging, and is a behavioural regression between .NET and the .NET Framework.
We can use System.IO.Packaging to create a new
Package
and addPackagePart
s to it, specifying aCompressionOption
on these PackageParts. When we write the Package to a ZIP file viaZipArchive
, the new PackagePart is turned into aZipArchiveEntry
and the PackagePart's CompressionOption is mapped to aCompressionLevel
. ZipArchiveEntry then passes that through to create the correct deflation stream and to set bits 1 & 2 in the file header's "general purpose bit flags". This behaviour works well.The regression here comes when we load a Package from an existing stream and start to inspect each PackagePart's CompressionOption. In .NET Framework, Package calls
GetCompressionOptionFromZipFileInfo
, which maps bits 1 & 2 to a CompressionOption. Similarly, in .NET Package callsGetCompressionOptionFromZipFileInfo
. This always returnsCompressionOption.Normal
, because ZipArchiveEntry doesn't expose its CompressionLevel publicly.When the issue's closed, I'd like to be able to persist a Package containing a PackagePart with a given CompressionOption, then be able to read that CompressionOption back upon reading it.
There are three parts to this issue:
CompressionLevel
property, and for the CompressionLevel enumeration to include a new member namedFast
;API request
This request is primarily for a read-only CompressionLevel property on ZipArchiveEntry. I've chosen not to make it read-write -
PackagePart.CompressionOption
is read-only, so it's not needed for this specific use case. It might also imply that a ZIP file would be flagged for decompression and recompression (or simply decompressed) by setting it: I'm not sure how ZipArchiveEntry would handle a situation where the entry's header specifies one compression level, but the entry data is compressed with another; there'd definitely be a problem if its compression level was changed to NoCompression!I've also rolled in a new enumeration member on CompressionLevel to address an ambiguous CompressionOption/CompressionLevel mapping.
CompressionLevel -> CompressionOption mapping
This is different between .NET and .NET Framework, largely because .NET Framework has an extra member in its DeflateOptionEnum which doesn't neatly map to the current .NET equivalent of CompressionLevel at the moment.
Currently, the mapping from CompressionOption to CompressionLevel when writing a PackagePart is:
If the new CompressionLevel enumeration member is approved, I'd change CompressionOption.Fast to map to CompressionLevel.Fast.
.NET Standard has a different mapping for CompressionOption.Maximum to avoid introducing a breaking change for .NET Framework applications referencing System.IO.Packaging.
The mapping from CompressionLevel to CompressionOption when reading a Package needs to roughly correlate with the table above so that values roundtrip properly.
To continue to avoid a breaking change for .NET Framework applications, .NET Standard would continue to return Optimal in all cases.
New CompressionLevel member, otherwise-ambiguous CompressionOption/CompressionLevel mapping
At present, creating a PackagePart with a CompressionOption of Fast or SuperFast will result in a ZipArchiveEntry with a CompressionLevel of Fastest, setting general purpose bits 1 & 2. This ambiguity means that somebody could create a PackagePart with a CompressionOption of Fast, and it would subsequently be read back with a CompressionOption of Fastest.
This ambiguity arises because the CompressionLevel enumeration doesn't have a member named Fast (or something similarly named, halfway between Optimal and Fastest.) This isn't ideal, so my API request asks for this new member.
Adding a new CompressionLevel also requires selecting the correct ZLib compression level, and while I'd guess that 4 would be a good compromise between Optimal (6) and Fastest (1), I don't have anything quantitative to back that. I'm also not sure what impact the pending switch of ZLib implementation would have on this.
cc @dotnet/area-system-io-compression @carlossanlop