adamhathcock / sharpcompress

SharpCompress is a fully managed C# library to deal with many compression types and formats.
MIT License
2.25k stars 479 forks source link

Expose more deflate options #282

Open Coloris opened 7 years ago

Coloris commented 7 years ago

Is-it possible to achieve the following with sharpcompress ? : https://stackoverflow.com/questions/45225914/how-to-get-the-compressed-size-ziparchive-of-a-filestream-block

adamhathcock commented 7 years ago

Just use DeflateStream on each 64k block. No need to actually make a zip file.

Coloris commented 7 years ago

I'm writing an windows store package editor, the appx format used here is basically a zip( can be zip64) deflate file with some XML metadatas.

In order to create a package, I for example need to split a file in several 64kb chunks, compress each of them using the deflate method then store the compressed length of each chunk inside an XML file. So I need to make a zip (appx) file in the end.

Are you saying that either compressing a file with a deflate-stream or directly into a ZIP, the compressed data are going to be strictly identical on both ends ? If not, is it possible to compress data by chunk and get the compressed length of each ?

adamhathcock commented 7 years ago

You could make a zip and get the length but it’s more efficient just to make a deflate stream write to a file or memory and get the length that way. It doesn’t matter if it’s 64k chunks or whole files.

I can’t help with the appx spec. I thought it was a regular zip file.

Coloris commented 7 years ago

Thanks for your help adam. It is a regular zip file :p But it requires an XML file that contains, for every single entry, a hash of each 64kb uncompressed block and the size of the compressed block inside the archive.

A better explanation from Microsoft :

When an app file is added to the app package, it is first divided into 64-KB blocks, and each block is hashed using the algorithm specified. If the size of the file is not an even multiple of 64 KB, the size of the final block is inferred as the remainder of the file size divided by 64 KB.

The Size attribute value is the size of the data block as stored in the app package. This is usually smaller than 64 KB because each block is commonly compressed before being stored in the app package. Because data compression (Deflate algorithm) produces a variable-length result, the Size attribute must be specified for all blocks of a file stored in compressed form within the package.

At the very end, I need to make a zip (appx) file, can I first compress each block with a deflate stream, then collect the compressed length on-the-fly and finally compress the source file into a zip entry ?

Is processing this way going to give me the same compressed data on both ends ? I ask because I imagine that compressing a file by chunk is not the sane as doing it in a single shot ? In my case, it is not really important if the compression is not very efficient.

adamhathcock commented 7 years ago

No it won’t be the same as doing a single stream vs 64k chunks. You can do anything you’re talking about with DeflateStream and/or Zip classes. It’s just a matter of figuring out the algorithm for making appx files which I can’t help with.

Coloris commented 7 years ago

Sure I wasn't expecting you to help me with the appx part :D How can I access the deflate stream with sharpcompress while adding a zip entry ? Is the compression done in real-time or only when the stream is closed ?

adamhathcock commented 7 years ago

It's done in real time when writing the stream. Closing the stream just guarentees the final bit which is what you need to do 64k at a time.

You're going to have to chunk the file and compress first to get the data you want then later create the final zip file.

Coloris commented 7 years ago

Ok, this make sense ! One last question if you don't mind, Is there any equivalent of zlib Z_FULL_FLUSH when using a DeflateStream with the Zip class ? This is what I need (and cannot do with native .net io.compression since it is too high level).

adamhathcock commented 7 years ago

I don’t currently expose it but I could easily on ZipWriterOptions. The enum is FlushType and could be used along side DeflateCompression.

A pull request is welcomed :)