Closed mattst88 closed 4 years ago
The thread pool implementation that is behind this is IMO pretty stupid. I made sure it works with the intention to get back to it later. Interestingly, with my tests it has shown "reasonable" results so far.
As a benchmark I used a squashfs image from the Debian live DVD and ran "sqfs2tar debian.sqfs | tar2sqfs -j 4 -f out.sqfs". The dead simple thread pool managed to get me a 3x speed up, so there is definitely room for improvement. A real 4x speed up is probably unrealistic since there are synchronisation points like fragment blocks.
Another factor is memory consumption: mksquashfs fills up the entire RAM. My implementation has a rather low maximum of in-flight blocks and stops filling the queue if that threshold is reached. A few commits back I implemented an option that can be used to increase this backlog.
I wouldn't work with the current git tree at the moment tough, since I'm still doing refactoring to make libsquashfs.so a thing and haven't run all of the static analysis and regression tests yet.
Another thing: Does your git tree consist entirely of files smaller than the block size?
If so, that might be an explanation.
Fragment processing (checksumming, de-duplication and indexing; the last two need synchronisation) is done entirely in the main thread. Once a fragment block is full, it is submitted to the work queue which does compressing in one of the worker threads.
I did some profiling once and determined the crc32 checksumming to rank rather high among the time wasters. This was also the reason I threw out the crc32 implementation and used the one from zlib, making zlib a hard dependency.
mksquashfs also uses a much simpler 16 bit BSD checksum to determine two blocks are equal.
Another factor is memory consumption: mksquashfs fills up the entire RAM. My implementation has a rather low maximum of in-flight blocks and stops filling the queue if that threshold is reached. A few commits back I implemented an option that can be used to increase this backlog.
Thanks, that sounds useful. I would gladly trade memory usage for lower CPU utilization.
Another thing: Does your git tree consist entirely of files smaller than the block size?
Pretty close to it. Some shell magic tells me that 95% of the files are <= 4096 bytes.
AgentD closed this
Oh, I wasn't aware that we expected this to be solved yet. I retested with v0.7 and for my use case tar2sqfs now takes 12 seconds vs 2 seconds for mksquashfs. So that's a massive improvement over the 90 seconds I recall tar2sqfs taking.
I suspect there's still some performance to be gained but this is a very good improvement. For my own knowledge, what commits do you think caused the significant performance improvement?
Thanks for the feedback! It's definitely interesting to hear about the actual impact of the current implementation.
There is room for improvement and I'm definitely not that happy with the current implementation of the tread pool block processor. I considered it good enough for now with the intention to improve upon it later (with the changes completely hidden behind the API) and do some actual profiling.
A contributing factor to the performance difference might also that libsquashfs uses crc32 for block deduplication, while mksquashfs uses a 16 bit BSD checksum.
As expected, moving the checksumming into the worker thread greatly improves performance for applications that have lots of files smaller than block size. This was done in commit 9bc8200, but a lot refactoring was required to get there. Unfortunately, this is spread over a bunch of commits with other stuff done in between (caused by procrastinating and not cleaning up afterwards).
When using
tar2sqfs
orgensquashfs
regardless of the number of jobs I request CPU usage never goes over ~105%. scanelf -n indeed shows that they are linked against libpthread.I use the following script to produce squashfs images containing Gentoo's ebuild repository:
Replacing the
gensquashfs
line with themksquashfs
line reduces the time to run from minutes to less than 10 seconds. Preferably I would just usetar2sqfs
and avoid checking out a git worktree (or even better: add support togit archive
for producing squashfs images directly).Is it expected that
tar2sqfs
orgensquashfs
do not use many cores as well asmksquashfs
?