AgentD / squashfs-tools-ng

A new set of tools and libraries for working with SquashFS images
Other
194 stars 30 forks source link

Import and use Mesa's hash table #43

Closed mattst88 closed 4 years ago

mattst88 commented 4 years ago

With perf record/perf report I saw that 30% of the time was spent in sqfs_frag_table_find_tail_end with tar2sqfs for a tarball containing the Gentoo ebuild repository (many thousands of small files).

The reason was the bucketing hash table in frag_table.c: too many elements in too few buckets meant lots of walking over the linked lists.

This patch replaces that hash table with the hash table implementation from Mesa. Its implementation is more complex (is is an open-addressing, linear-reprobing) hash table, but it is much better suited for the task.

On my 4c/8t Skylake, the time to run tar2sqfs drops from 7.5s to less than 3s. CPU usage increases from ~207% to ~356%, presumably indicating an increase in available parallelism due to the removal of the hash table as a bottleneck. The perf report profile with this patch shows that the time spent in sqfs_frag_table_find_tail_end has dropped from ~30% to 0.01%.

Output from ministat:

x before
+ after
    N          Min          Max       Median          Avg        Stddev
x  20        7.476        7.685       7.5725       7.5615   0.051254268
+  20         2.79        2.901        2.846      2.84475    0.03543842
Difference at 95.0% confidence
    -4.71675 +/- 0.0282015
    -62.3785% +/- 0.241477%
    (Student's t, pooled s = 0.0440618)

I imported only the bits of the hash table implementation that were needed for frag_table.c. Among the changes I made after importing are

Fixes: #40 Signed-off-by: Matt Turner mattst88@gmail.com

mattst88 commented 4 years ago

I should also mention that I didn't do any reformatting of the imported code. I'm happy to do it, but that decision is your prerogative.

AgentD commented 4 years ago

Thanks! I will also re-run my own parallelism benchmarks some time within the next few days and update the results.

Edit: regarding indentation style, as long as the changes are minor, I would leave it as is for now. This should make it easier to diff against the version in the mesa repository, if that should change.