AgentD / squashfs-tools-ng

A new set of tools and libraries for working with SquashFS images
Other
194 stars 30 forks source link

Performance and RAM usage for large filesystems #5

Closed AgentD closed 4 years ago

AgentD commented 4 years ago

Currently, operations are performed sequentially:

The "generate tree in memory first, then do something with it" works great for small filesystems, but something like unpacking a livecd image with rdsquashfs is practically impossible, filling up all RAM in the process. This will also be a problem when trying to generate the image with gensquashfs.

The steps need to be interspersed to reduce memory consumption, essentially eliminating the in-memory tree to the extend possible.

AgentD commented 4 years ago

UPDATE

The problems regarding memory usage turned out to be a fluke. My test image from a Fedora LiveCD managed to trigger a bug that has since been fixed. I continued testing with a Debian CD, since it actually contains a file system (the Fedora one contains only an ext4 image).

Using 4 jobs, the parallel unpacker in rdsquashfs manages to extract an entire 2GiB Debian LiveCD image in around 3 minutes on my laptop (half that time on a more powerfull Xeon test server I have access to), both using a plain old hard drive containing input image and output file tree and both aren't exactly top-of-the-line hardware anymore. IMO that time is okay-ish. Memory usage turns out to be negligible.

rdsquashfs could still benefit from only extracting the part of the tree we are interested in. Running rdsquashfs -l / on the Debian image chokes for a little over second on my laptop before spitting out the directory listing.