dyz1990 / sevenz-rust

A 7z decompressor/compressor lib written in pure rust
Apache License 2.0
146 stars 24 forks source link

Empty files are ignored, improvements #44

Closed mhtmhn closed 7 months ago

mhtmhn commented 8 months ago

Thanks for this library. Really a life saver! I have three things I want to bring to your attention (v0.5.3):


1) sevenz_rust::default_entry_extract_fn seems to ignore files that are empty with if entry.size() > 0 { ...

While it may seem logical to do so, it isn't exactly lossless as shown below.

image

This is just a sample, I see a diff of about 250 files in my actual archive. So I feel such assumptions should be avoided since empty files aren't uncommon (for e.g. __init__.py in python).
This is just FYI, I solved this by directly iterating over the entries.


2) The name FolderDecoder is a bit misleading.

It processes the solid blocks in the archive and has nothing to do with the number of folders.


3) Some speed comparisons for decompression!

2.1GB, 35682 Files, 2013 Folders compressed with 64MB solid block size, 4MB dictionary. Yields a 499 MB 7z archive with 39 solid blocks.

7zFM - 8m 03s sevenz_rust - 5m 40s sevenz_rust w/ rayon - 1m 17s Impressive!

All tests on i7-1165G7 (4C/8T) SSD: KBG40ZNS512G Seq. R/W (MB/s): 2200/1400 Random R/W (IOPS): 330K/190K

dyz1990 commented 7 months ago

Thanks for the report.

  1. Very strange. The test cases decompress_single_empty_file_unencoded_header and decompress_two_empty_files_unencoded_header both passed successfully.
  2. Maybe BlockDecoder is better.
  3. It's really impressive.