Cysharp / MemoryPack

Zero encoding extreme performance binary serializer for C# and Unity.
MIT License
3.29k stars 193 forks source link

Brotli Decompression issue, when not using compressionlevel.fastest #243

Closed FilixDark closed 7 months ago

FilixDark commented 7 months ago

I am having an issue with BrotliCompression and Decompression when not using Fastest. The code works fine with fastest, but I would really love to use smallest size for archieving stuff.

This is how I implemented it.

compression

using (var fs = new FileStream(filename+".compressed", FileMode.CreateNew)) {
    using (var cp = new BrotliCompressor(System.IO.Compression.CompressionLevel.Optimal)) {
        MemoryPackSerializer.Serialize(cp, log);
        await cp.CopyToAsync(fs);
    }
}

decompression

using (var fs = new FileStream(filename, FileMode.Open)) { 
    var buffer = new byte[fs.Length];
    await fs.ReadAsync(buffer, 0, buffer.Length);
    using (var dc = new BrotliDecompressor()) {
        var decomp = dc.Decompress(new ReadOnlySpan<byte>(buffer));   // This line is throwing 
// MemoryPack.MemoryPackSerializationException: 'Failed in Brotli compression/decompression process, status: NeedMoreData.'
        var log = MemoryPackSerializer.Deserialize<Tradelog>( decomp);
        return log;
    }
}
hadashiA commented 7 months ago

Your sample seems to have worked on my end.

I would like you to confirm the following

await fs.ReadAsync(buffer, 0, buffer.Length);

Here, is ReadAsync reading buffer.Length correctly? This third argument is the "maximum number of bytes". So it is possible that ReadAsync cannot read the specified number of bytes at a time.

I think you need to check the return value of ReadAsync and read repeatedly if there are not enough bytes to read. For example, as follows.

var readBytes = 0;
while (readBytes < length)
{
    readBytes += await fs.ReadAsync(buffer, 0, buffer.Length);
}

If the problem persists, please let us know the environment in which you ran each serialization and deserialization. Thanks.

FilixDark commented 7 months ago

Hi, thanks for the reply. I tried this and sadly it was reading always the full size, but implemented it anyway. I made a small test project with a sample file, what causes issues. Hope that makes it easier to replicate.

brotliissue.zip

And I am running this on Windows 10 Pro 64 Microsoft Visual Studio Community 2022 (64-bit) - Current Version 17.9.2

hadashiA commented 7 months ago

Thanks a lot for the reproduced code. Apparently the problem is in BrotliCompressor.CopyToAsync.

If you pass a very large MemoryStream to CopyToAsync at once, there seems to be a problem with it not being written all the way through.

In your code, it would be as follows.

        static async Task SaveLogfile(TradeLog log, MemoryStream ms, CompressionLevel compressionLevel) {
            using (var cp = new BrotliCompressor(compressionLevel)) {
                    MemoryPackSerializer.Serialize(cp, log);
                    await cp.CopyToAsync(ms);
                }
            await ms.FlushAsync(); }
        }

I will attempt to fix this in #249.

NOTE: Since the buffer size can be specified in the second argument of CopyToAsync, I think this problem can be avoided by enlarging it.

By the way, I would like to recommend the following points: If you want to serialize in memory one at a time, don't use Stream. Stream support is primarily for I/O to output to, such as File.

The above code could use IBufferWriter instead.

var buffer = new ArrayBufferWriter<byte>(inputBytes.Length);
compressor.CopyTo(buffer);      
hadashiA commented 7 months ago

This fix is released in 1.20.1. Please let us know if there is anything else. Thanks.

FilixDark commented 7 months ago

Thanks a lot for the reproduced code. Apparently the problem is in BrotliCompressor.CopyToAsync.

If you pass a very large MemoryStream to CopyToAsync at once, there seems to be a problem with it not being written all the way through.

In your code, it would be as follows.

        static async Task SaveLogfile(TradeLog log, MemoryStream ms, CompressionLevel compressionLevel) {
            using (var cp = new BrotliCompressor(compressionLevel)) {
                    MemoryPackSerializer.Serialize(cp, log);
                    await cp.CopyToAsync(ms);
                }
            await ms.FlushAsync(); }
        }

I will attempt to fix this in #249.

NOTE: Since the buffer size can be specified in the second argument of CopyToAsync, I think this problem can be avoided by enlarging it.

By the way, I would like to recommend the following points: If you want to serialize in memory one at a time, don't use Stream. Stream support is primarily for I/O to output to, such as File.

The above code could use IBufferWriter instead.

var buffer = new ArrayBufferWriter<byte>(inputBytes.Length);
compressor.CopyTo(buffer);      

Hi thanks for the fix, usually i write it to a file, I just made it to a memorystream so you dont have to wear off your ssd when testing it. I hope this comment doesnt reopen the issue now. I am not really familiar with github.