Open carlossanlop opened 3 years ago
Tagging subscribers to this area: @dotnet/area-system-io-compression See info in area-owners.md if you want to be subscribed.
Author: | carlossanlop |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.IO.Compression` |
Milestone: | - |
It would be a great enhancement for .Net, but also for the public visibility of this impressive compression algorithm. If you accept it, I can contribute to make it happen. I already foresee multiple steps:
Open questions:
Thank you, @manandre for your offer!
Let's start by discussing the stream API.
I think it makes sense for the stream class to look very similar to Deflate, since both would only wrap a compression algorithm (unlike the Zip, GZip, ZLib APIs, which additionally represent a compression/archiving format).
I am thinking we can avoid creating too many constructors by creating a separate ZStandardOptions
class to specify the configuration values.
The ZStandardOptions
class will allow specifying the compression level using an integer (and will throw if specifying an out-of-bounds value). This will help avoid falling into the typical CompressionLevel
limitation of only 4 values. But, if the user desires to use it anyway, we can provide a constructor that takes a CompressionLevel
and converts it to a predefined value from the compression level range allowed by ZStandard, which goes from 1 to 22, with 3 being default. The user should also be able to specify negative levels, according to the manual:
The library supports regular compression levels from 1 up to ZSTD_maxCLevel(), which is currently 22. Levels >= 20, labeled
--ultra
, should be used with caution, as they require more memory. The library also offers negative compression levels, which extend the range of speed vs. ratio preferences. The lower the level, the faster the speed (at the cost of compression).
WriteByte
is a method that we decided to override in ZLibStream
, but not in DeflateStream
or GZipStream
. Do we need to override it here?ZStandardStream|Options
or do we prefer the shorter word ZstdStream|Options
? I'm inclined for the first one.--fast
somehow, or should the class take care of that automatically?CompressionLevel
may not play well with having a public settable property for int CompressionLevel
. What if the user specifies a value for both?:
var options = new ZStandardOptions(level: CompressionLevel.SmallestSize) { Level = -5 };
namespace System.IO.Compression
{
public class ZStandardOptions
{
/// <summary>Allow mapping the CompressionLevel enum to predefined levels for ZStandard:
/// - CompressionLevel.NoCompression = 1, // Official normal minimum
/// - CompressionLevel.Fastest = 1, // Official normal minimum
/// - CompressionLevel.Optimal = 3, // Official default: ZSTD_CLEVEL_DEFAULT
/// - CompressionLevel.SmallestSize = 22 // Official maximum: ZSTD_MAX_CLEVEL
/// </summary>
public ZStandardOptions(CompressionLevel level);
// Min = ZSTD_minCLevel() which can be negative, Max=ZSTD_maxCLevel()=22, Default=ZSTD_CLEVEL_DEFAULT=3, throw if out-of-bounds
int CompressionLevel { get; set; }
CompressionMode Mode { get; set; }
bool LeaveOpen { get; set; }
static int MaxCompressionLevel { get; } // P/Invoke for current maximum: 22
}
public class ZStandardStream : Stream
{
public ZStandardStream(Stream stream, ZStandardOptions? options); // If options null, then use default values
public Stream BaseStream { get; }
public override bool CanRead { get; }
public override bool CanSeek { get; }
public override bool CanWrite { get; }
public override long Length { get; }
public override long Position { get; set; }
public override IAsyncResult BeginRead(byte[] buffer, int offset, int count, AsyncCallback? asyncCallback, object? asyncState);
public override IAsyncResult BeginWrite(byte[] buffer, int offset, int count, AsyncCallback? asyncCallback, object? asyncState);
public override void CopyTo(Stream destination, int bufferSize);
public override Task CopyToAsync(Stream destination, int bufferSize, CancellationToken cancellationToken);
protected override void Dispose(bool disposing);
public override ValueTask DisposeAsync();
public override int EndRead(IAsyncResult asyncResult);
public override void EndWrite(IAsyncResult asyncResult);
public override void Flush();
public override Task FlushAsync(CancellationToken cancellationToken);
public override int Read(byte[] buffer, int offset, int count);
public override int Read(Span<byte> buffer);
public override Task<int> ReadAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken);
public override ValueTask<int> ReadAsync(Memory<byte> buffer, CancellationToken cancellationToken = default(CancellationToken));
public override int ReadByte();
public override long Seek(long offset, SeekOrigin origin);
public override void SetLength(long value);
public override void Write(byte[] buffer, int offset, int count);
public override void Write(ReadOnlySpan<byte> buffer);
public override void WriteByte(byte value); // ZLibStream overrides it, but not Deflate/GZipStream
public override Task WriteAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken);
public override ValueTask WriteAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken = default(CancellationToken));
}
}
WriteByte
: ZlibStream
and GZipStream
overrides are delegated to DeflateStream
which does not override it. But BrotliStream
does override it to route it directly to the overridden Span-based implementation. We should do the same...ZStandardStream|Options
ZSTD_defaultCLevel()
(returning ZSTD_CLEVEL_DEFAULT
) is available since version 1.5.0.System.IO.Compression
namespace but grouped in a dedicated System.IO.Compression.Brotli
assembly. It seems the best compromise to make it easily accessible without forcing to load it in memory if not explicitly referenced.ZStandardStream|Options
, WorkerCount
or MaxDegreeOfParallelism
(as in ParallelOptions
), with 0 as default value (or maybe Environment.ProcessorCount
?).CompressionMode
? CompressionMode.Decompress
?LeaveOpen
? false
?Compression.Fastest
to ZSTD_minCLevel()
as "The lower the level, the faster the speed".MinCompressionLevel
and DefaultCompressionLevel
as static accessors aside the MaxCompressionLevel
one.BufferSize
configuration property (like in FileStreamOptions
)? ZSTD_CStreamOutSize()
and ZSTD_DStreamOutSize()
could be used as the default value.FYI @VSadov this may be particularly interesting to single-file compression as it is supposed to be very fast for decompression.
This might mean we would need deeper runtime integration to be usable during bundler loading.
How does the multi-threading work internally? Does it integrate somehow with the usual .NET infrastructure (TaskScheduler
and such)? Or does the library start native threads?
I wonder about that because sometimes you need threading to play nice with what else lives in the same process. In a web app, multi-threading could cause load spikes that crowd out request work from the CPU. Reducing the DOP is only a partial fix because multiple parallel compression jobs would again saturate all cores and cause the problem to reappear. Isolating such work onto a custom thread pool can be a solution and it would not work if the library starts its own threads.
Another concern would be startup overhead for multi-threading inside the library. Is there thread pooling?
It seems to me that CompressionMode
should be a mandatory constructor argument. There is no sensible default and without that argument the meaning of the code is unclear.
bool LeaveOpen
is about the stream, not about compression. In my opinion, it does not belong into the options class. It should be a constructor argument specific for the stream. This option would, for example, not apply for a static helper method static byte[] Compress(byte[] data, ZStandardOptions? options)
. The options object would now carry around ignored options.
About thread pooling, the zstd.h
header file contains:
/* ! Thread pool :
* These prototypes make it possible to share a thread pool among multiple compression contexts.
* This can limit resources for applications with multiple threads where each one uses
* a threaded compression mode (via ZSTD_c_nbWorkers parameter).
* ZSTD_createThreadPool creates a new thread pool with a given number of threads.
* Note that the lifetime of such pool must exist while being used.
* ZSTD_CCtx_refThreadPool assigns a thread pool to a context (use NULL argument value
* to use an internal thread pool).
* ZSTD_freeThreadPool frees a thread pool, accepts NULL pointer.
*/
typedef struct POOL_ctx_s ZSTD_threadPool;
ZSTDLIB_API ZSTD_threadPool* ZSTD_createThreadPool(size_t numThreads);
ZSTDLIB_API void ZSTD_freeThreadPool (ZSTD_threadPool* pool); /* accept NULL pointer */
ZSTDLIB_API size_t ZSTD_CCtx_refThreadPool(ZSTD_CCtx* cctx, ZSTD_threadPool* pool);
Zstandard would be very useful to single-file compression. We currently use ZLib/Deflate as it is available in the runtime, but would prefer something faster as impact of decompression is very noticeable at start up.
We did examine lz4 and Zstd as alternative choices of which lz4 is faster at decompression, but Zstd would allow to keep the same compression ratio as with Deflate.
If there is Zstd support in the runtime, single-file compression will definitely switch to it.
Here are some interesting benchmarks: https://github.com/google/brotli/issues/553. ZStandard offers a really nice trade-off for speed and compression ratio.
It looks like Chrome may also be getting support for decoding zstd encoded content, making this also relevant to web / cloud scenarios.
https://chromestatus.com/feature/6186023867908096
Putting in my vote or support, and hoping to see this prioritized in the .NET 9.0 planning.
UPDATE: Chrome has confirmed that they are shipping zstd support in v123.
I have open https://github.com/dotnet/aspnetcore/issues/50643 to support the zstd Content-Encoding in ASP .NET Core. It is currently considered as blocked by the support of the ZStandard compression in the .NET Runtime. @carlossanlop Can we make it happen in .NET 9? Indeed I am still ready to help on this topic.
+1
Is there any plan to support it in Net 9.0?
Chrome 123 release support zstd
Could you consider it for .NET 9 ?
It's super cool to see they released this in Chrome. I think the biggest motivating factor for getting this work done is that ASP.NET can support zstd
as an out-of-the-box encoding option.
It looks like Facebook.com is already serving webpages with zstd
compression; adding it to the dotnet webstack would be amazing!
Most implementations bind to the native Facebook libs, but there are a few existing c# projects that are ports, like: https://github.com/oleg-st/ZstdSharp
Chrome 123 release support zstd
* https://developer.chrome.com/blog/new-in-chrome-123#more * https://github.com/facebook/zstd/releases/tag/v1.5.6
Could you consider it for .NET 9 ?
Since the 126 release Mozilla Firefox also supports zstd compression: https://www.mozilla.org/en-US/firefox/126.0/releasenotes/
This for net 9 would be awesome, it would also be great for other algorithms like lzma2.
I noticed that this issue has been open for a few years now, and I was wondering if there are any plans to add Zstandard (Zstd) support to .NET. If not, I’d be happy to contribute to help implement this feature.
Given the performance benefits and the wide adoption of Zstd, I think it would be a great addition to the framework. If there are any steps or guidelines you can share, I’d love to assist in moving this forward.
Looking forward to your feedback and guidance!
Thanks!
yes pls zstd lzma2 and 7z to net 9
@siyavash1984 thank you! We still need to propose the APIs first. Here's the process: https://github.com/dotnet/runtime/blob/43813ac73242fa78c463d456bf755e3a6622b5d7/docs/project/api-review-process.md
At the moment we have this initial proposal https://github.com/dotnet/runtime/issues/59591#issuecomment-933059993 and one reply discussing it. Additional feedback and discussion is welcome on these APIs (or additional proposed ones) to keep this moving.
In terms of API proposal:
AsStream
methods that allow stream-based usage over a pipe based implementation, so there's no (or barely any?) loss of generality.
Zstandard (or Zstd) is a fast compression algorithm that was published by Facebook in 2015, and had its first stable release in May 2021.
Their official repo offers a C implementation. https://github.com/facebook/zstd
Data compression mechanism specification: https://datatracker.ietf.org/doc/html/rfc8478
Features:
It's used by:
We could offer a stream-based class, like we do for Deflate with
DeflateStream
orGZipStream
, but we should also consider offering a stream-less static class, since it's a common request.