Open ahedberg opened 4 years ago
The only real use of this function seems to be at source/common/network/transport_socket_options_impl.cc, where it seems to be used to compute some kind of fingerprint. It may make sense to call djb2CaseInsensitiveHash directly and delete CaseInsensitiveHash to avoid use in containers.
This use also doesn't require stability across restarts, so we could just outright migrate to absl::Hash at the same time if that makes more sense.
The absl::Hash functor adds hashing support for a type which is controlled by us and hashing implies the type needs to be instantiated. I added hashing support to Envoy::Http::LowerCaseString
and added benchmarks to compare absl::Hash-based hahsing and the existing djb2-based functor in https://github.com/rojkov/envoy/pull/2.
The results for djb2-based functor:
BM_RemoveTokensLong/8 500 ns 499 ns 1398876
BM_RemoveTokensLong/64 3394 ns 3388 ns 206346
BM_RemoveTokensLong/512 30048 ns 29996 ns 22904
BM_RemoveTokensLong/4096 292963 ns 292404 ns 2328
BM_RemoveTokensLong/8192 728479 ns 727128 ns 964
BM_CreateCaseUnorderedSet/8 258 ns 258 ns 2707808
BM_CreateCaseUnorderedSet/64 2160 ns 2156 ns 325820
BM_CreateCaseUnorderedSet/512 27457 ns 27404 ns 25302
BM_CreateCaseUnorderedSet/4096 430956 ns 430180 ns 1620
BM_CreateCaseUnorderedSet/8192 1100068 ns 1097241 ns 642
For absl::Hash
BM_RemoveTokensLong/8 568 ns 567 ns 1091066
BM_RemoveTokensLong/64 3991 ns 3986 ns 176470
BM_RemoveTokensLong/512 32421 ns 32369 ns 21365
BM_RemoveTokensLong/4096 278556 ns 278081 ns 2534
BM_RemoveTokensLong/8192 563884 ns 562856 ns 1256
BM_CreateCaseUnorderedSet/8 317 ns 316 ns 2210124
BM_CreateCaseUnorderedSet/64 2600 ns 2596 ns 275066
BM_CreateCaseUnorderedSet/512 21555 ns 21517 ns 32737
BM_CreateCaseUnorderedSet/4096 231011 ns 230574 ns 3021
BM_CreateCaseUnorderedSet/8192 486659 ns 485269 ns 1443
The current implementation works faster if the number of elements is less than 600-1000 (as it doesn't need instantiation allegedly). When the number of elements is greater than that excessive probing starts to manifest.
Title: djb2CaseInsensitiveHash may cause excessive probing with Abseil hash tables
Description: My colleagues who implemented the Abseil containers (@fowles and @sbenzaquen) mentioned that djb2CaseInsensitiveHash will likely cause excessive probing with
absl::{flat,node}_hash_{set,map}
.StringUtil
uses this hash [1] in anabsl::flat_hash_set
[2]. Ideally this would be built on top of theabsl::Hash
framework [3] instead.Relevant Links: [1] https://github.com/envoyproxy/envoy/blob/02a526257a013e80bc508399446aac0234c3ec4e/source/common/common/utility.cc#L472 [2] https://github.com/envoyproxy/envoy/blob/02a526257a013e80bc508399446aac0234c3ec4e/source/common/common/utility.h#L170 [3] https://abseil.io/docs/cpp/guides/hash