Closed mattst88 closed 4 years ago
Thanks for testing! This issue hasn't cropped up with the live DVD images that I uses so far.
Porting an existing hash table implementation would probably be worthwhile alone for replacing the ad-hoc implementations scattered throughout the code. I guess the one from mesa would be a good fit because the code is not overly long and it is already built around xxhash32.
perf record
/perf report
ontar2sqfs /tmp/gentoo.sqfs.tmp -q -c zstd -X level=19 < /tmp/gentoo.tar
reports a startlingly high percentage of time spent in the functionsqfs_frag_table_find_tail_end
:Doubling
NUM_BUCKETS
(currently 128) infrag_table.c
cuts the percentage of time spent in this function in half up to 1024.gentoo.tar
in the example is generated from https://anongit.gentoo.org/git/repo/sync/gentoo.git withgit -C gentoo.git archive --format=tar stable > /tmp/gentoo.tar
Presumably we don't want to quadruple the number of buckets. Perhaps we should consider a new hash table implementation? Mesa has a high-quality, MIT licensed, open-addressing, linear-reprobing hash table implementation in https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/util/hash_table.c that may be worth considering. I can try hacking that up if the idea sounds nice to you.