AgentD / squashfs-tools-ng

A new set of tools and libraries for working with SquashFS images
Other
194 stars 30 forks source link

gensquashfs: please consider adding equivalent of mksquashfs -sort option #94

Closed intrigeri closed 2 years ago

intrigeri commented 2 years ago

Hi,

At Tails we started migrating to squashfs-tools-ng. We've completed this migration for the SquashFS "diffs" (deltas stacked with overlayfs) used for our incremental upgrades. Thanks for this project!

Now, regarding migrating to gensquashfs to build our ISO/USB images, I anticipate that the main blocker is going to be the lack of an equivalent of the -sort option of mksquashfs:

       -sort SORT_FILE
           sort files according to priorities in SORT_FILE. One file or dir with priority per line.
           Priority -32768 to 32767, default priority 0.

It matters for us because last time we tried dropping this option, we've measured 55% longer boot time, which is not acceptable in our context.

Please consider adding something similar to gensquashfs :)

The exact format of the sort file does not matter much to us.

Sorry if this feature already exists and I missed it!

Thanks again :)

AgentD commented 2 years ago

Hi @intrigeri ,

thanks for the feedback. There really was no feature like this up to now. I took the opportunity to also expose some block processor packing flags, pieced together a prototype implementation, and did some rudimentary testing.

This patch set applies to the 1.1.3 release tar ball: sort_patch.tar.gz

It basically adds a --sort-file or -S option to gensquashfs where a file can be specified with one entry per line. Entries are space separated and start with a 64 bit, signed priority number (lower comes first), followed by the filename (with space in between). The filename can be quoted if necessary ("...") with \ serving as escape character.

Optionally, between the filename and the priority, a list of comma separated flags can be specified, enclosed in square brackets (e.g. [flag1,flag2]). Flags can be used to e.g. disable fragmenting or compression, or force block alignment, which could be used to sacrifice data density for read-speed.

I also added support for fnmatch based globbing: If the flag glob or glob_no_path is specified, the filename is interpreted as a glob pattern. The former does path matching only, the later allows wild cards to match slashes.

So far, I managed to knock ~5% of size off my Debian test image and the result is still equal (according to sqfsdiff). I'm looking forward to seeing if this works for you as expected. Meanwhile, I'll add a section to the gensquashfs man page.

Greetings,

David

intrigeri commented 2 years ago

Hi @AgentD,

I took the opportunity to also expose some block processor packing flags, pieced together a prototype implementation, and did some rudimentary testing.

Wow, this was fast. Thank you so much!

So far, I managed to knock ~5% of size off my Debian test image and the result is still equal (according to sqfsdiff).

I'm surprised this has any impact on the size of generated images. It makes me curious :)

I'm looking forward to seeing if this works for you as expected.

Sure. I need to check if/how I can test this on our actual data without having to do a lot of annoying work (prepare a patched Debian package, backport to Buster, upload to Tails' APT repository). If by any chance #95 strikes your interest as well, ideally I would test both new features in 1 go, and if it goes well, as a result I'll have a Tails branch ready for review that fully switches to gensquashfs.

AgentD commented 2 years ago

So far, I managed to knock ~5% of size off my Debian test image and the result is still equal (according to sqfsdiff).

I'm surprised this has any impact on the size of generated images. It makes me curious :)

SquashFS packs files that are smaller than the block size into a fragment block. If you sort them by type (for testing I did some crude sorting based on file extension), you should end up with lots of similar structured files in the fragment blocks. E.g. for script files, you get lots of mostly ASCII text that uses lots of common keywords across files. This can improve compression a little.