MiloszKrajewski / K4os.Compression.LZ4

LZ4/LH4HC compression for .NET Standard 1.6/2.0 (formerly known as lz4net)
MIT License
675 stars 77 forks source link

Question: ReadAsync on LZ4EncoderStream? #57

Closed jdvor closed 3 years ago

jdvor commented 3 years ago

Is there a way how to set-up stream compression where something is reading from the compressing stream? In similar fashion to this:

var blobClient = containerClient.GetBlobClient(blobName);
using var fileStream = File.Open(filePath, FileMode.Open, FileAccess.Read, FileShare.Read);
using var lz4Stream = LZ4Stream.Encode(fileStream, lz4Opts, leaveOpen: true);

// blobClient is Azure.Storage.Blobs.BlobClient
// it will try to read lz4Stream and it will fail with InvalidOperationException: "Operation ReadAsync is not allowed for LZ4EncoderStream"
//    at K4os.Compression.LZ4.Streams.Internal.LZ4StreamBase.ReadAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken)
var result = await blobClient.UploadAsync(lz4Stream, overwrite: true).ConfigureAwait(false);

lz4Stream.Close();
fileStream.Close();

The solution in this case might be an intermediary MemoryStream to which lz4Stream would be copied entirely, but that would kind of defeats whole purpose of streaming approach to compression, no?

This works, but feels wrong:

var blobClient = containerClient.GetBlobClient(blobName);
var fileInfo = new FileInfo(filePath);
using var fileStream = fileInfo.Open(FileMode.Open, FileAccess.Read, FileShare.Read);
using var intermediaryStream = new MemoryStream((int)fileInfo.Length / 4);
using var lz4Stream = LZ4Stream.Encode(intermediaryStream, lz4Opts, leaveOpen: true);
fileStream.CopyTo(lz4Stream);
lz4Stream.Flush();
intermediaryStream.Seek(0, SeekOrigin.Begin);

var result = await blobClient.UploadAsync(intermediaryStream, overwrite: true).ConfigureAwait(false);

lz4Stream.Close();
fileStream.Close();
MiloszKrajewski commented 3 years ago

I assume you want compression to be done on read. Like you give it uncompressed stream start reading from it and you get compressed bytes?

<good-news> I understand what you need, and why you need it! </good-news>

<bad-news> It is not supported, LZ4Encoder/LZ4Decoder would enable it, but wiring in LZ4Stream would need to be totally reversed as it feels complicated. I know crypto stream does something like that (you can both encrypt or decrypt on both reads and writes), but it is much easier for crypt as input and output are the same size. </bad-news>

There are some good news though, not fantastic, but good:

<good-news> @AArnott created library allowing do do fancy things with streams: Nerdbank.Streams One of the classes is SimplexStream which allows to have a stream which you can write from one end and read from the other. I guess to avoid buffering everything (which would reduce it to the MemoryStream solution) you would need to compress/write on one thread, and read/upload on the other and then Task.WhenAll(...) for both tasks. Not exactly what you wanted, but not bad either. </good-news>

Is this helpful?

jdvor commented 3 years ago

Thanks, I will definitely check the Nerdbank.Streams.

Also I was thinking about somehow wiring up PipeReader & PipeWriter from System.IO.Pipelines with your LZ4 codec in between might do the trick.

I'll share the solution here if I would make something useful.

MiloszKrajewski commented 3 years ago

Success? Can I close it?