Regex overhead is high with a large number of clusters

jmarantz commented 6 years ago

Description: See https://github.com/envoyproxy/envoy/issues/2063 and the 10k testcase. In that testcase, with https://github.com/envoyproxy/envoy/pull/2358 applied, the 30 second startup time is dominated by regex code associated with tags.

I did a little debugging to try to understand the tags, and I still don't. But my naive observation is that we are applying a bunch of regexes to stat names to do this transform (in Counter& ThreadLocalStoreImpl::ScopeImpl::counter():

input:
    name = "http.admin.downstream_cx_http1_total"
outputs:
   tag_extracted_name = "http.downstream_cx_http1_total"
   tags = vector {{
      name_ = "envoy.http_conn_manager_prefix", 
      value_ = "admin"
   }}

My feeling is that this is a structural transformation that could be done much faster if the name structure is retained in the stat structures, rather than flattening the string in stats and then rediscovering. But I don't have enough of a feel for whether this alone would make startup fast, or whether the effort to fix would be worth instant startups on huge numbers of clusters.

jmarantz commented 6 years ago

Also many of the regexes in source/common/config/well_known_names.cc have required prefixes (e.g. ^http ^mongo ^cluster) and we could structure them as a map<string,vector<regex>> and we only apply the regexes that match the first token. For those that don't we could have a list of unprefixed regexes that would need to applied universally.

mattklein123 commented 6 years ago

@jmarantz I would recommend following up with @mrice32 who wrote all this code. If it looks like the regex parsing is now a large part of startup time, I'm guessing there are some low hanging fruit optimizations we could consider such as the ones you mention.

jmarantz commented 6 years ago

will do, thanks!

mrice32 commented 6 years ago

We could definitely apply the map<string,vector<regex>> optimization.

It's also possible to turn tags off by setting use_all_default_tags to false and not including any custom tags. However, maybe people would prefer a more obvious way to disable tags? Turning off the regexes might also be a good way to tell what our max perf improvement will be in fixing this.

As for the canonical representation, I agree that the best solution would be to embed the structure of the tags in the instantiation of each stat if this is becoming a performance bottleneck. Let's chat offline about this @jmarantz.

jmarantz commented 6 years ago

I pushed an experimental commit that helps quite a bit.

https://github.com/jmarantz/envoy/commit/b1f25cf3dde8de38651df18e5f0a0021f37b126f

Filter regex lookups by prescanning the regex input string and quickly ruling out inputs that can't match.

This results in a ~3x speedup on the 10k cluster test from https://github.com/envoyproxy/envoy/issues/2063 . Though it still takes too long to startup IMO.

This shouldn't really help at all; I didn't convert to a map because I was unsure of what to enter into the map. I could do an upper-bound/lower-bound type search though. But the optimization in this commit is more of a peephole tweek and I'm surprised it helped at all. I wonder if this speeks to how suboptimal std::regex is for our application.

I have two higher level thoughts:

does the std::regex library vasty underperform RE2? I looked for an extant benchmark but did not find one. I might try porting to RE2 just to see if that improves things much more.
are we calling regex way too much in the startup flow?

jmarantz commented 6 years ago

And some data from the 1k cluster example from bug 2063, run with callgrind, and with exit(0) injected prior to the listener loop in main_common.cc

ScopeImpl::counter called 61k times. ScopeImpl::gauge called 10k times. Stats::extractTag called 1.56M times. std::__detail::Executor<...some regex thing...> called 11M times

Note that in this run, extractTag skips the regex match if the prefix doesn't match.

jmarantz commented 6 years ago

I've done some more digging by instrumenting a map and some timers, indicating which regexes take a lot of time. This table measures the microseconds consumed parsing specific regexps. The action taken reflects whether the regex matched ("success") was executed and failed ("miss") or was skipped by the hack I described above ("prefix-discard", "embedded-discard").

These are semi-sorted by hand :)

microseconds consumed   action                  name
-------------------------------------------------------------------------------
2993691:                miss                    envoy.grpc_bridge_method
2920885:                miss                    cipher_suite
2903371:                miss                    envoy.grpc_bridge_service
1042328:                miss                    envoy.http_conn_manager_prefix
337193:                 success                 envoy.cluster_name
182114:                 miss                    envoy.response_code
134511:                 miss                    envoy.response_code_class
35603:                  embedded-discard        envoy.response_code
35804:                  embedded-discard        envoy.response_code_class
28141:                  prefix-discard          envoy.dyanmo_table
28698:                  prefix-discard          envoy.dynamo_operation
34606:                  prefix-discard          envoy.dynamo_partition_id
33137:                  prefix-discard          envoy.fault_downstream_cluster
34684:                  prefix-discard          envoy.http_user_agent
28725:                  prefix-discard          envoy.listener_address
33139:                  prefix-discard          envoy.mongo_callsite
32518:                  prefix-discard          envoy.mongo_cmd
32695:                  prefix-discard          envoy.mongo_collection
31976:                  prefix-discard          envoy.mongo_prefix
33859:                  prefix-discard          envoy.ratelimit_prefix
28443:                  prefix-discard          envoy.ssl_cipher
29455:                  prefix-discard          envoy.tcp_prefix
33770:                  prefix-discard          envoy.virtual_cluster
33987:                  prefix-discard          envoy.virtual_host
33601:                  prefix-discard          envoy.clientssl_prefix
508:                    miss                    envoy.dyanmo_table
508:                    miss                    envoy.dynamo_operation
520:                    miss                    envoy.dynamo_partition_id
490:                    miss                    envoy.fault_downstream_cluster
511:                    miss                    envoy.http_user_agent
104:                    miss                    envoy.ssl_cipher
152:                    success                 envoy.http_conn_manager_prefix
62:                     miss                    envoy.listener_address
10:                     prefix-discard          envoy.cluster_name
10:                     prefix-discard          cipher_suite
4:                      prefix-discard          envoy.grpc_bridge_method
6:                      prefix-discard          envoy.grpc_bridge_service
9:                      success                 envoy.listener_address
14:                     success                 envoy.response_code_class
5:                      miss                    envoy.cluster_name

jmarantz commented 6 years ago

My suspicion is that these slow regex lookups are due to regex backtracking (background: https://regular-expressions.mobi/catastrophic.html?wlr=1) and may be resolvable by tweaking the regexps to be pickier (e.g. less use of "." in the regex, and more specific character sequences that specifically exclude known separators).

jmarantz commented 6 years ago

progress update: I have some changes I'm testing now which get the startup time for the 10k-cluster config from 30 seconds to about 5 seconds, and I think I can trim that to about 2-3 seconds.

The changes entail:

extractTag doesn't mutate the stat_name but tracks which charcters should be elided so they can all be done at once. This makes the regexes potentially simpler because they don't have to deal with fields that may be absent depending on order of operation.
computing prefix_ and do a quick absl::StartsWith bail-out before starting either regex or token parsing.
- adds an optional alternative syntax based on a simple token-splitting algorithm with keywords:
```
$1    throw away one token
$c1   capture one token
$*    throw away many tokens greedily
$c*   capture many tokens greedily
```
  I

jmarantz commented 6 years ago

I have gotten this to start up in 2.5 seconds, and I think this is probably where I'll pause for now, though there is still opportunity for improvement, and even the changes I have so far will really required multiple PRs, which will be forthcoming as I clean them up.

The aggregated (and not-ready-for-review) changes are on https://github.com/jmarantz/envoy/commits/prescan-regex

jmarantz commented 6 years ago

@mattklein123 @htuch @alyssawilk can one of you assign this to me?

I have a more detailed implementation plan in a 2-page google doc

mattklein123 commented 6 years ago

@jmarantz is there remaining work on this or should we resolve as complete (for now)?

mattklein123 commented 6 years ago

Per discussion in other perf issue, closing this as fixed. We can open new issues as needed. Awesome work!!!

envoyproxy / envoy

Regex overhead is high with a large number of clusters #2373