MiloszKrajewski / K4os.Compression.LZ4

LZ4/LH4HC compression for .NET Standard 1.6/2.0 (formerly known as lz4net)
MIT License
675 stars 77 forks source link

InvalidDataException: LZ4 frame magic number expected #76

Closed dewanymca closed 1 year ago

dewanymca commented 1 year ago

We have been using PackageReference Include="lz4net.netstandard" Version="1.0.15.93" for our compression in an application which is based on .NET COre 3.1. Everything works here.

However we are upgrading our tech stack to .NET 6 and after upgrade to .NET 6 and Using ="K4os.Compression.LZ4.Streams" Version="1.2.16" and running into issue while decompress- error is InvalidDataException: LZ4 frame magic number expected

sample code which is throwing this is: public Stream Decompress(byte[] bytes) { var output = new MemoryStream(); var compressStream = new MemoryStream(bytes);

        using (var lz4Stream = LZ4Stream.Decode(compressStream))
        {
            lz4Stream.CopyTo(output);
        }

        output.Position = 0;
        return output;
    }

We even tried Legacy package which also fails with a different error Please help, we are blocked

MiloszKrajewski commented 1 year ago

Legacy (lz4net) streams can only be handled by .Legacy library. So "We even tried Legacy package" is the only way to make it work, therefore "which also fails with a different error" is the only question I have - what was this different error? can you provide code? can you provide file?

dewanymca commented 1 year ago

Thanks @MiloszKrajewski for help on this.

This is stack trace while using Legacy .net 6. I also see same error message when using lz4net.netstandard System.AggregateException HResult=0x80131500 Message=One or more errors occurred. (Unexpected end of stream) Source=System.Threading.Tasks.Parallel StackTrace: at System.Threading.Tasks.TaskReplicator.Run[TState](ReplicatableUserAction1 action, ParallelOptions options, Boolean stopOnFirstFailure) at System.Threading.Tasks.Parallel.PartitionerForEachWorker[TSource,TLocal](Partitioner1 source, ParallelOptions parallelOptions, Action1 simpleBody, Action2 bodyWithState, Action3 bodyWithStateAndIndex, Func4 bodyWithStateAndLocal, Func5 bodyWithEverything, Func1 localInit, Action1 localFinally) --- End of stack trace from previous location --- at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection exceptions, CancellationToken cancelToken, Exception otherException) at System.Threading.Tasks.Parallel.PartitionerForEachWorker[TSource,TLocal](Partitioner1 source, ParallelOptions parallelOptions, Action1 simpleBody, Action2 bodyWithState, Action3 bodyWithStateAndIndex, Func4 bodyWithStateAndLocal, Func5 bodyWithEverything, Func1 localInit, Action1 localFinally) at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable1 source, ParallelOptions parallelOptions, Action1 body, Action2 bodyWithState, Action3 bodyWithStateAndIndex, Func4 bodyWithStateAndLocal, Func5 bodyWithEverything, Func1 localInit, Action1 localFinally) at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable1 source, Action`3 body)

at Program.<

$>d__0.MoveNext() in C:\Code\ConsoleApp1\Program.cs:line 56 at Program.
(String[] args)

This exception was originally thrown at this call stack: K4os.Compression.LZ4.Legacy.LZ4Stream.AcquireNextChunk() K4os.Compression.LZ4.Legacy.LZ4Stream.Read(byte[], int, int) System.IO.Stream.CopyTo(System.IO.Stream, int) System.IO.Stream.CopyTo(System.IO.Stream) System.Threading.Tasks.Parallel.PartitionerForEachWorker.AnonymousMethod1(ref System.Collections.IEnumerator, int, out bool) System.Threading.Tasks.Parallel.PartitionerForEachWorker.AnonymousMethod1(ref System.Collections.IEnumerator, int, out bool) System.Threading.Tasks.TaskReplicator.Replica.ExecuteAction(out bool) ... [Call Stack Truncated]

Inner Exception 1: EndOfStreamException: Unexpected end of stream

Code:

public Stream Decompress(byte[] bytes) { var output = new MemoryStream(); var compressStream = new MemoryStream(bytes);

        using (var lz4Stream = LZ4Legacy.Decode(compressStream))
        {
            lz4Stream.CopyTo(output);
        }

        output.Position = 0;
        return output;
    }

I even tried calling from .NET Core 3.1, it failed with below error: System.AggregateException HResult=0x80131500 Message=One or more errors occurred. (One or more errors occurred. (The type initializer for 'K4os.Compression.LZ4.Engine.LL' threw an exception.)) Source=System.Private.CoreLib StackTrace: at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification) at System.Threading.Tasks.Task1.get_Result() at ConsoleApp2.Program.Main(String[] args) in C:\Code\ConsoleApp2\Program.cs:line 60

This exception was originally thrown at this call stack: K4os.Compression.LZ4.Internal.Mem.CloneArray(uint[]) K4os.Compression.LZ4.Engine.LL.LL()

Inner Exception 1: AggregateException: One or more errors occurred. (The type initializer for 'K4os.Compression.LZ4.Engine.LL' threw an exception.)

Inner Exception 2: TypeInitializationException: The type initializer for 'K4os.Compression.LZ4.Engine.LL' threw an exception.

Inner Exception 3: FileNotFoundException: Could not load file or assembly 'System.Runtime.CompilerServices.Unsafe, Version=5.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'. The system cannot find the file specified.

MiloszKrajewski commented 1 year ago

This AggregateException from System.Threading.Tasks.Parallel has nothing to do with LZ4.

What are those bytes[] in var compressStream = new MemoryStream(bytes);? Where they are coming from? IT seems like it is not valid legacy stream (thus Unexpected end of stream). Can I get those bytes?

What was the code which created those bytes? Was it as stream? or a block?

Can you show code on the other side (lz4net): get plain bytes and compress them?

MiloszKrajewski commented 1 year ago

when calling it from .NET 3.1 there is a problem with 'System.Runtime.CompilerServices.Unsafe' which is raised and worked around in #72, but it is a different issue. You primary issue still is: your bytes are most likely not a legacy stream.

dewanymca commented 1 year ago

This AggregateException from System.Threading.Tasks.Parallel has nothing to do with LZ4.

What are those bytes[] in var compressStream = new MemoryStream(bytes);? Where they are coming from? IT seems like it is not valid legacy stream (thus Unexpected end of stream). Can I get those bytes?

What was the code which created those bytes? Was it as stream? or a block?

Can you show code on the other side (lz4net): get plain bytes and compress them?

I totally understand AggregateException is nothing to do with Lz4. Bytes are coming from upstream where decryption happened. Actually full flow, we save SymmerticKeyEncrypted and compressed payload in a column of db. After retrieval we decrypt and then try to decompression.

below is code which does Decryption:

public override byte[] Decrypt(byte[] encryptedMessage) { var iv = new byte[_algorithm.BlockSize / 8]; var encryptedBytes = new byte[encryptedMessage.Length - iv.Length]; Array.Copy(encryptedMessage, 0, iv, 0, iv.Length); Array.Copy(encryptedMessage, iv.Length, encryptedBytes, 0, encryptedBytes.Length); try { return Decrypt(_aesAlgorithm, iv, _key, encryptedBytes); } catch (Exception e) { return Decrypt(_algorithm, iv, _key, encryptedBytes); } }

    public static byte[] Decrypt<T>(T algorithm, byte[] iv, byte[] key, byte[] encryptedBytes) where T : SymmetricAlgorithm
    {
        using (var decryptor = algorithm.CreateDecryptor(key, iv))
        using (var memoryStream = new MemoryStream(encryptedBytes))
        using (var cryptoStream = new CryptoStream(memoryStream, decryptor, CryptoStreamMode.Read))
        {
            // plain bytes can't be larger than encrypted length
            byte[] buffer = new byte[encryptedBytes.Length];
            var numBytes = cryptoStream.Read(buffer, 0, buffer.Length);
            var bytes = new byte[numBytes];
            Array.Copy(buffer, 0, bytes, 0, bytes.Length);
            return bytes;
        }
    }

and then we decompress. Old implementation using lz4net is below: public class LZ4CompressionProvider : ICompressionProvider { public string Prefix => "_lz4:"; private readonly LZ4StreamFlags _streamFlags;

    public LZ4CompressionProvider()
    {
        _streamFlags = LZ4StreamFlags.HighCompression;
    }

    public LZ4CompressionProvider(LZ4StreamFlags streamFlags)
    {
        _streamFlags = streamFlags;
    }

    public static Stream CompressStream(Stream stream)
    {
        return new LZ4Stream(stream, LZ4StreamMode.Compress, LZ4StreamFlags.HighCompression);
    }

    public static Stream DecompressStream(Stream stream)
    {
        return new LZ4Stream(stream, LZ4StreamMode.Decompress);
    }

    public Stream Compress(byte[] bytes)
    {
        return ProcessBytes(bytes, LZ4StreamMode.Compress, _streamFlags);
    }

    public Stream Compress(Stream stream)
    {
        return new LZ4Stream(stream, LZ4StreamMode.Compress, _streamFlags);
    }

    public Stream Decompress(byte[] bytes)
    {
        return ProcessBytes(bytes, LZ4StreamMode.Decompress);
    }

    public Stream Decompress(Stream stream)
    {
        return DecompressStream(stream);
    }

    private MemoryStream ProcessBytes(byte[] bytes, LZ4StreamMode streamMode, LZ4StreamFlags streamFlags = LZ4StreamFlags.Default)
    {
        var output = new MemoryStream();
        var compressStream = new MemoryStream(bytes);

        using (var lz4Stream = new LZ4Stream(compressStream, streamMode, streamFlags))
        {
            lz4Stream.CopyTo(output);
        }

        output.Position = 0;
        return output;
    }
}
MiloszKrajewski commented 1 year ago

So the shortest answer is: I don't know. Maybe you forgot to flush the stream and it is length 0, or maybe you messed with encryption padding. All those extra elements don't help.

Although, to help you a little I prepared example in:

https://github.com/MiloszKrajewski/K4os.Compression.LZ4/tree/master/assets/issue76

You will find two projects there:

If you can make this example fail with your data files, this might be something to start from, if not then problem is somewhere else.

dewanymca commented 1 year ago

Thanks for pointer. After this, i was able to pinpoint issue and fix it as well. I was running into breaking changes in .NET 6 as described here.

https://learn.microsoft.com/en-us/dotnet/core/compatibility/core-libraries/6.0/partial-byte-reads-in-streams

I am able to fix this and now legacy stream is working.

Couple of follow up question:

  1. What is Plan for Legacy package? Willl you keep on maintaining it for future .net release or you are thinking it to be out of support now onwards?
  2. We are looking to upgrade to new Stream so thought process is compress new payload with new format and continuing support for existing data read via Legacy path. Do you have any recommendation how should we provide this compatibility. One naive way is, when we write new format, we should add additional metadata stating this is new format and use this why reading it and decompress. With existing data, this metadata will not be there so use legacy decompression?
MiloszKrajewski commented 1 year ago
  1. .Legacy assemple: It is what it is as people have old streams, so they need to read them somehow. That's it. I planning keep it working as it is for new .NET versions, but no new functionality.
  2. So I understand the problem of not knowing which stream it is, but also I cannot do anything: old stream is what it is, I am not able change format now. New stream is compatible with official LZ4 specification, so I cannot change it either. Few options:
    • If you store it in DB: add a column indicating which version it is
    • If you store it on disk: add different extension
    • If you can peek into first few bytes, you can check if it has magic number (0x184D2204)
dewanymca commented 1 year ago
  1. .Legacy assemple: It is what it is as people have old streams, so they need to read them somehow. That's it. I planning keep it working as it is for new .NET versions, but no new functionality.
  2. So I understand the problem of not knowing which stream it is, but also I cannot do anything: old stream is what it is, I am not able change format now. New stream is compatible with official LZ4 specification, so I cannot change it either. Few options:
  • If you store it in DB: add a column indicating which version it is
  • If you store it on disk: add different extension
  • If you can peek into first few bytes, you can check if it has magic number (0x184D2204)

Can you please double click on #3, peeking into first few byte. Do you means to say 1st byte will have this magic number always in new Lz4.?

MiloszKrajewski commented 1 year ago

First four bytes of LZ4 "official" stream is 0x184D2204