bazelbuild / sandboxfs

A virtual file system for sandboxing
Apache License 2.0
372 stars 39 forks source link

Use hashbrown for better performance #75

Closed jmmv closed 5 years ago

jmmv commented 5 years ago

Profiling sandboxfs during large builds showed a non-negligible (but not large) amount of time going into hashtable operations. Given that we don't care about "secure" hashing, we can afford to switch to a faster implementation like hashbrown, so do it here.

This yields a ~1% performance boost during the Bazel build of a large application using sandboxfs. The following numbers are based on 10 consecutive builds for each case and are in seconds:

std: mean 3429.56, median 3425.00, stddev 21.56 hashbrown: mean 3394.50, median 3378.50, stddev 36.31

djc commented 5 years ago

(You're probably aware of this already, but in case anyone else is wondering about this, you actually switched from std::collections::HashMap with SipHash 1-3 to hashbrown::HashMap with FxHash -- so it's unclear whether the performance came from the change of implementation, the change of hash function, or (most likely) both.)

jmmv commented 5 years ago

@djc Ah thanks. I noticed that I could make the standard HashMap use FxHash, but the numbers in the hashbrown page seemed convincing to switch both details at once :P I could measure again all these cases, but given that the performance differences are pretty minor and there is some noise in the metrics, it'd be hard to appreciate those.

djc commented 5 years ago

Yeah, it makes sense to switch both at the same time in this case -- it's just that your commit message saying "Given that we don't care about "secure" hashing, we can afford to switch to a faster implementation like hashbrown, so do it here." does not entirely make sense.

jmmv commented 5 years ago

Thanks @hlopko. Also updated the commit based on @djc's comment.