envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25.02k stars 4.82k forks source link

djb2CaseInsensitiveHash may cause excessive probing with Abseil hash tables #11720

Open ahedberg opened 4 years ago

ahedberg commented 4 years ago

Title: djb2CaseInsensitiveHash may cause excessive probing with Abseil hash tables

Description: My colleagues who implemented the Abseil containers (@fowles and @sbenzaquen) mentioned that djb2CaseInsensitiveHash will likely cause excessive probing with absl::{flat,node}_hash_{set,map}. StringUtil uses this hash [1] in an absl::flat_hash_set [2]. Ideally this would be built on top of the absl::Hash framework [3] instead.

Relevant Links: [1] https://github.com/envoyproxy/envoy/blob/02a526257a013e80bc508399446aac0234c3ec4e/source/common/common/utility.cc#L472 [2] https://github.com/envoyproxy/envoy/blob/02a526257a013e80bc508399446aac0234c3ec4e/source/common/common/utility.h#L170 [3] https://abseil.io/docs/cpp/guides/hash

antoniovicente commented 4 years ago

The only real use of this function seems to be at source/common/network/transport_socket_options_impl.cc, where it seems to be used to compute some kind of fingerprint. It may make sense to call djb2CaseInsensitiveHash directly and delete CaseInsensitiveHash to avoid use in containers.

mattklein123 commented 4 years ago

This use also doesn't require stability across restarts, so we could just outright migrate to absl::Hash at the same time if that makes more sense.

rojkov commented 3 years ago

The absl::Hash functor adds hashing support for a type which is controlled by us and hashing implies the type needs to be instantiated. I added hashing support to Envoy::Http::LowerCaseString and added benchmarks to compare absl::Hash-based hahsing and the existing djb2-based functor in https://github.com/rojkov/envoy/pull/2.

The results for djb2-based functor:

BM_RemoveTokensLong/8                                500 ns          499 ns      1398876
BM_RemoveTokensLong/64                              3394 ns         3388 ns       206346
BM_RemoveTokensLong/512                            30048 ns        29996 ns        22904
BM_RemoveTokensLong/4096                          292963 ns       292404 ns         2328
BM_RemoveTokensLong/8192                          728479 ns       727128 ns          964
BM_CreateCaseUnorderedSet/8                          258 ns          258 ns      2707808
BM_CreateCaseUnorderedSet/64                        2160 ns         2156 ns       325820
BM_CreateCaseUnorderedSet/512                      27457 ns        27404 ns        25302
BM_CreateCaseUnorderedSet/4096                    430956 ns       430180 ns         1620
BM_CreateCaseUnorderedSet/8192                   1100068 ns      1097241 ns          642

For absl::Hash

BM_RemoveTokensLong/8                     568 ns          567 ns      1091066
BM_RemoveTokensLong/64                   3991 ns         3986 ns       176470
BM_RemoveTokensLong/512                 32421 ns        32369 ns        21365
BM_RemoveTokensLong/4096               278556 ns       278081 ns         2534
BM_RemoveTokensLong/8192               563884 ns       562856 ns         1256
BM_CreateCaseUnorderedSet/8               317 ns          316 ns      2210124
BM_CreateCaseUnorderedSet/64             2600 ns         2596 ns       275066
BM_CreateCaseUnorderedSet/512           21555 ns        21517 ns        32737
BM_CreateCaseUnorderedSet/4096         231011 ns       230574 ns         3021
BM_CreateCaseUnorderedSet/8192         486659 ns       485269 ns         1443

The current implementation works faster if the number of elements is less than 600-1000 (as it doesn't need instantiation allegedly). When the number of elements is greater than that excessive probing starts to manifest.