adamhathcock / sharpcompress

SharpCompress is a fully managed C# library to deal with many compression types and formats.
MIT License
2.26k stars 479 forks source link

Question: How to get XZ uncompressed size #594

Open x1unix opened 3 years ago

x1unix commented 3 years ago

Hello, as far as I know XZ format has index section which contains archive metadata (most notably - uncompressed size).

I've skimmed through XZ implementation in this package and looks like sharpcompress can read XZ index, but it's impossible to get XZBlock information without reading and decompressing whole archive contents.

How can I get XZ index information using this library without extracting archive contents?

It would nice to have to populate uncompressed stream size in Length property.

adamhathcock commented 3 years ago

If it's in the metadata, then it's something that's just been overlooked for whatever reason. Should be a relatively quick thing to do.

x1unix commented 3 years ago

@adamhathcock as far as I understand, uncompressed size can be calculated by reading XZIndex, but currently there is no known option to read only archive structure without unarchiving Xz contents (as XZStream returns extracted archive contents).

XZIndex becomes available only after a whole archive was read:

XzStream.cs

       public override int Read(byte[] buffer, int offset, int count)
        {
            int bytesRead = 0;
            if (_endOfStream)
            {
                return bytesRead;
            }

            if (!HeaderIsRead)
            {
                ReadHeader();
            }

            bytesRead = ReadBlocks(buffer, offset, count);
            if (bytesRead < count)
            {
                _endOfStream = true;
                ReadIndex();
                ReadFooter();
            }
            return bytesRead;
        }
x1unix commented 3 years ago

Similar issue in related lzma project - https://github.com/addaleax/lzma-native/issues/15

Might be useful for implementation.

adamhathcock commented 3 years ago

Zip has the same issue with streamed files where you don't know the size before compression.

We should be able to implement this size on XZ when using Archive strategy but not Reader strategy

x1unix commented 3 years ago

@adamhathcock here is a simple snippet to calculate uncompressed size of XZ contents. Hope it helps.

Works only with seekable streams. For non-seakable streams, a whole file should be read before.

public class XzFileInfo
    {
        private const int XzHeaderSize = 12;
        public static ulong GetUncompressedSize(string filePath)
        {
            using var file = File.Open(filePath, FileMode.Open);

            // Read the footer from the end. Footer size is 12 bytes according to the spec.
            file.Seek(-XzHeaderSize, SeekOrigin.End);
            var footer = XZFooter.FromStream(file);
            Debug.WriteLine($"BackwardSize: {footer.BackwardSize}");

            // Get xz index offset from BackwardSize and seek to it.
            file.Seek(-(XzHeaderSize + footer.BackwardSize), SeekOrigin.End);
            var index = XZIndex.FromStream(file, false);
            Debug.WriteLine($"Index: number of records - {index.NumberOfRecords}");

            // Calculate total uncompressed size of each block. 
            var size = index.Records.Select(r => r.UncompressedSize).Aggregate((acc, x) => acc + x);
            Debug.WriteLine($"Total size of uncompressed archive: {UnitFormatter.FormatByteSize(size)} ({size} bytes)");
            return size;
        }
    }