AgentD / squashfs-tools-ng

A new set of tools and libraries for working with SquashFS images
Other
194 stars 30 forks source link

Feature Request: Block-based Compression Early Abort for Incompressible Data in gensquashfs #114

Open wychen opened 1 year ago

wychen commented 1 year ago

gensquashfs currently retains the original data if the compressed output is larger than the source. However, performing heavy-duty compression on incompressible data and then discarding it may be wasteful. I propose adding a command line option to gensquashfs that enables a quick entropy measurement before performing compression. If a block is deemed incompressible, we can simply keep the original data without wasting computational resources on compression.

We could use a fast compression method, such as zstd level 1, to gauge the entropy. In this case, when using the default xz level 6, zstd level 1 introduces less than 2% of computational overhead. This approach would provide a net gain if the source files contain at least 2% of incompressible blocks, which is not an unreasonable scenario.

Alternative methods, such as file-based skipping mechanisms with filename matching or file type detection, may be less accurate. Specifically, files containing mixed compressibility resources, such as PDFs with both text (compressible) and JPEG images (not compressible), or uncompressed tar files or VM images containing various file types, could benefit from a more granular block-based approach.

This idea is inspired by the ZFS LZ4 early abort mechanism, although the requirements and trade-offs in our context may be different. For reference, I have filed a similar issue on the squashfs-tools repository at https://github.com/plougher/squashfs-tools/issues/240.

I'm happy to refine my local prototype and send a PR, but I'd like to ensure that this feature aligns with the project's direction first. Thank you for your time and consideration. I'm looking forward to hearing your thoughts on this proposal and the potential advantages it could bring to squashfs-tools-ng and the community.