MiloszKrajewski / K4os.Compression.LZ4

LZ4/LH4HC compression for .NET Standard 1.6/2.0 (formerly known as lz4net)
MIT License
675 stars 77 forks source link

Is it possible to create a "backward compatible" stream? #58

Closed bratfizyk closed 3 years ago

bratfizyk commented 3 years ago

Background

In my current system I use the deprecated lz4.net library. I'm planning to migrate to K4os.Compression.LZ4, but I already have hundreds of thousands files compressed using the old LZ4Stream. The files are scattered across many locations and I don't want to migrate them all at once to the new LZ4 Stream format.

In an ideal world I'd like newly created files in my system to use the new Stream format, i.e. K4os.Compression.LZ4.Streams.LZ4Stream.Encode.

Question

Is it possible to decode data in the following way:

MiloszKrajewski commented 3 years ago

So definitely such thing does not exist (a stream which can read both formats). I was working on assumption that you (the user) can tell them apart yourself (like ProtocolVersion field in database, or different extension in filename).

First 4 bytes of NEW stream is a magic number '0x184D2204' maybe this might help? (see: https://github.com/MiloszKrajewski/K4os.Compression.LZ4/blob/f9e70f19d46ce5cec2ef858475129c648f704680/src/K4os.Compression.LZ4.Streams/LZ4DecoderStream.async.cs#L51)

if magic number is not there, than InvalidDataException is thrown (see: https://github.com/MiloszKrajewski/K4os.Compression.LZ4/blob/f9e70f19d46ce5cec2ef858475129c648f704680/src/K4os.Compression.LZ4.Streams/LZ4DecoderStream.cs#L93)

I know this is not ideal, as you would still need to open stream, read 4 bytes, and open stream again, but that's all I can offer you at the moment.

bratfizyk commented 3 years ago

Ok, thanks for the feedback. Closing the issue as you said:

I was working on assumption that you (the user) can tell them apart yourself

bratfizyk commented 3 years ago

Actually, @MiloszKrajewski there's one more thing I'd like to know. In the old LZ4.NET library we used to have Wrap and Unwrap functions that accept byte arrays and compress/decompress them returning another byte array.

I see these methods in this repository as well in Legacy module. Is there any other method that does the same thing outside of Legacy. I found LZ4Pickle class that has functions with signatures byte[] -> byte[]. However, when using the Pickle method, the output byte array doesn't begin with the MagicNumber, which suggests it doesn't do the same thing as LZ4Stream.

Most likely I'm missing a single piece here in order to understand everything :).

MiloszKrajewski commented 3 years ago

So Pickle/Unpickle has the same purpose as Wrap/Unwrap (thus same signature) but is not compatible, I was not planning backwards compatibility so Pickle/Unpickle does not any magic number. It is forward compatible (I reserved 3 bits for version) but not backwards.

You can use Legacy assembly one to read old ones (Unwrap) and write them in new format (Pickle), but knowing which ones are old/new is on you.

For example, I had a cache with lots of blobs packed with old LZ4. On migration I've just added new column (let's call it CompressionAlgorithm at set it to 0 top to bottom). Now every time new entry is written then I use Pickle and CompressionAlgorithm is stored as 1. On read, I read CompressionAlgorithm first and decide to use Unpickle or Unwrap depending if it is 0 or 1.

You could also use Pickle with IBufferWriter overload to do you own prefixing with magic number.

I do understand this is suboptimal and it would be much better if it was backwards compatible, but it isn't... Any legacy support was not even planned at first and added much later (see: #20)

MiloszKrajewski commented 3 years ago

Most likely I'm missing a single piece here in order to understand everything :).

Unfortunately you are asking very legitimate questions and I'm sorry that answers are most of the time: "you have to work around it yourself".

bratfizyk commented 3 years ago

Thanks for responding once again. This all sounds reasonable and saves me a lot of time guessing. Much appreciated!

I found an easy way to implement Wrap and Unwrap using Streams, so if I need them, I know what to do, no worries.

MiloszKrajewski commented 3 years ago

Stream come with quite large overhead. It is fine if we are talking about megabytes of data, but for short messages pickle is much better.

Try code below. It will, of course depend on size of your messages but if they are below 64k it will be much much (much) quicker than stream:

public static byte[] MyPickleMagic = 
    BitConverter.GetBytes(0x13371234);

public static void MyPickle(
    ReadOnlySpan<byte> source, IBufferWriter<byte> target)
{
    target.Write(MyPickleMagic);
    LZ4Pickler.Pickle(source, target);
}

public static void MyUnpickle(
    ReadOnlySpan<byte> source, IBufferWriter<byte> target)
{
    if (!source.SequenceEqual(MyPickleMagic))
        throw new ArgumentException(
            "Pickle magic does not match");

    LZ4Pickler.Unpickle(
        source.Slice(MyPickleMagic.Length), target);
}