AgentD / squashfs-tools-ng

A new set of tools and libraries for working with SquashFS images
Other
196 stars 30 forks source link

Hash table in frag_table.c limits performance #40

Closed mattst88 closed 4 years ago

mattst88 commented 4 years ago

perf record/perf report on tar2sqfs /tmp/gentoo.sqfs.tmp -q -c zstd -X level=19 < /tmp/gentoo.tar reports a startlingly high percentage of time spent in the function sqfs_frag_table_find_tail_end:

    29.67%  tar2sqfs  libsquashfs.so.0.0.0  [.] sqfs_frag_table_find_tail_end
     7.82%  tar2sqfs  [kernel.vmlinux]      [k] do_syscall_64
     5.76%  tar2sqfs  libc-2.29.so          [.] __memset_avx2_erms
     3.34%  tar2sqfs  libsquashfs.so.0.0.0  [.] sqfs_frag_table_add_tail_end

Doubling NUM_BUCKETS (currently 128) in frag_table.c cuts the percentage of time spent in this function in half up to 1024.

    11.11%  tar2sqfs  [kernel.vmlinux]      [k] do_syscall_64
     8.77%  tar2sqfs  libc-2.29.so          [.] __memset_avx2_erms
     6.83%  tar2sqfs  libsquashfs.so.0.0.0  [.] sqfs_frag_table_find_tail_end
     3.57%  tar2sqfs  [kernel.vmlinux]      [k] syscall_return_via_sysret

gentoo.tar in the example is generated from https://anongit.gentoo.org/git/repo/sync/gentoo.git with git -C gentoo.git archive --format=tar stable > /tmp/gentoo.tar

Presumably we don't want to quadruple the number of buckets. Perhaps we should consider a new hash table implementation? Mesa has a high-quality, MIT licensed, open-addressing, linear-reprobing hash table implementation in https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/util/hash_table.c that may be worth considering. I can try hacking that up if the idea sounds nice to you.

AgentD commented 4 years ago

Thanks for testing! This issue hasn't cropped up with the live DVD images that I uses so far.

Porting an existing hash table implementation would probably be worthwhile alone for replacing the ad-hoc implementations scattered throughout the code. I guess the one from mesa would be a good fit because the code is not overly long and it is already built around xxhash32.