overload manager: overload signals based on number of downstream connections and active requests

antoniovicente commented 4 years ago

It would be helpful to add resource monitors to track the number of sockets used for (1) downstream connections, (2) active requests and (3) possibly upstream connections in order to provide better protection against resource attacks against configured fd rlimits. Tracking these counts proxy-wide should be sufficient. The motivations behind this enhancement include a desire for more consistent configuration of resource limits and actions we can take when approaching overload, recently introduced parameters to limit the max number of client connections (globally or per listener) to protect against fd rlimit attacks, and the introduction of more graceful options to handle increases in resource usage, including the introduction of adaptive HTTP request timeouts in #11427.

Possible future enhancements:

A resource pressure signal based on number of upstream connections would benefit from some additional overload actions specific to upstream connections, including reductions in idle upstream connection timeouts, disabling of keep-alive or adjustments to prefetch logic.
Being able to configure thresholds and apply overload manager actions to client connections associated with specific listeners in order to prioritize connections to specific listeners. That said, I don't think our deployment scenarios have an immediate need for such a hook.

antoniovicente commented 4 years ago

cc: @nezdolik

mattklein123 commented 4 years ago

@nezdolik once you accept the org invite we can assign the issue. Thank you for working on this!

nezdolik commented 4 years ago

Thanks @mattklein123 @antoniovicente, I believe issue assignment should work now.

akonradi commented 4 years ago

Do we have any established patterns for aggregating counts of objects across multiple workers efficiently? We probably have stats for most of the metrics we'd want to use but I don't know if the flush frequency is high enough for the overload manager to make use of them.

antoniovicente commented 4 years ago

It may make sense to keep an atomic counter with the aggregate number of active client connections across all listeners and clusters.

It may make sense to think of that number as separate from stats.

akonradi commented 4 years ago

What about having aggregate counts in the individual worker threads via a TLS object, then having the resource monitor aggregate those using an atomic counter when the overload manager initiates a poll?

nezdolik commented 4 years ago

@mattklein123 @antoniovicente @akonradi please take a look: https://docs.google.com/document/d/109p2eWereP9i_0XbuiVhsIWoJ6oI_jUhsjG-rHCDjyg/edit?usp=sharing

mattklein123 commented 4 years ago

Thanks @nezdolik! I will take a look next week. OOO for a bit.

nezdolik commented 3 years ago

I think this task needs to be splitted into 3 parts:

Introduce reactive checks into overload manager
Plug OM into dispatcher
Write the actual resource monitor & migrate code from TCP Listener

antoniovicente commented 3 years ago

I think this task needs to be splitted into 3 parts:

Introduce reactive checks into overload manager

Plug OM into dispatcher

Write the actual resource monitor & migrate code from TCP Listener

Splitting sounds fine. I'm curious about the details behind each of the 3 parts you describe above.

nezdolik commented 3 years ago

Introduce reactive checks into overload manager: OM currently only supports periodic recalculations of resource usage + flushing updates to action callbacks. For the case of global connection limit overflow this type of check may be too slow, so we should support "immediate/reactive" check. If we proceed with example of global downstream connections, every time before TCP listener accepts new connection it fires on-demand check to OM for the number of global connections, OM would reactively recalculate corresponding resource usage and trigger callbacks for overload action (if it changes state).
Plug OM into dispatcher: If we continue with example of global downstream connections and have reactive checks in place, then TCPListener will need to have access to OM instance to invoke a reactive check (with incremented counter for number of accepted connections). I did some prior investigation on scope of work and it feels like a big task.
Write the actual resource monitor & migrate code from TCP Listener: The last part is to move code that tracks number of accepted connections and compares it against global maximum (set via runtime) from TCP listener to OM (+write new monitor). TCP listener should instead invoke reactive checks in OM and register callbacks for "Stop accepting connections" overload action.

antoniovicente commented 3 years ago

Introduce reactive checks into overload manager: OM currently only supports periodic recalculations of resource usage + flushing updates to action callbacks. For the case of global connection limit overflow this type of check may be too slow, so we should support "immediate/reactive" check. If we proceed with example of global downstream connections, every time before TCP listener accepts new connection it fires on-demand check to OM for the number of global connections, OM would reactively recalculate corresponding resource usage and trigger callbacks for overload action (if it changes state).

Plug OM into dispatcher: If we continue with example of global downstream connections and have reactive checks in place, then TCPListener will need to have access to OM instance to invoke a reactive check (with incremented counter for number of accepted connections). I did some prior investigation on scope of work and it feels like a big task.

I wonder if the listener can get the thread local overload state from thread-local-storage in some way. If that approach is not possible we may want to consider plumbing OM to dispatcher creation in a way that is similar to the plumbing done in https://github.com/envoyproxy/envoy/pull/14679 to add extra arguments to the factory method that creates dispatchers in worker threads.

Write the actual resource monitor & migrate code from TCP Listener: The last part is to move code that tracks number of accepted connections and compares it against global maximum (set via runtime) from TCP listener to OM (+write new monitor). TCP listener should instead invoke reactive checks in OM and register callbacks for "Stop accepting connections" overload action.

nezdolik commented 3 years ago

@antoniovicente the listener would get thread local overload state but it should also be able to trigger reactive check in OM (which would recalculate overload action state on demand and post it to all interested callbacks on worker threads), so is bidirectional interaction. How about something like this:

overload_manager.updateResource(
      std::string resource_name, uint_64 local_increment);

The listener would register for action in overload manager via registerForAction(const std::string&, Event::Dispatcher&, OverloadActionCb) as usual. OM would instantiate a reactive resource monitor (if configured) with global atomic counter for number of connections. Listener will invoke on demand overload_manager.updateResource(...) upon receiving socket event, reporting it's number of currently accepted sockets and asking to update reactive resource monitor for global number of connections. Om would immediately (on next main dispatcher event loop run, eg not via delayed periodic flushes) trigger resource usage recalculation in resource monitor, which will in turn invoke callbacks in OM and reevaluate overload actions tied to that resource. Om then would post updated overload action state (tied to that resource) to worker threads via their dispatchers and invoke registered callbacks.

This is how I envision reactive checks in overload manager but maybe there are better ways.

cc @akonradi

nezdolik commented 3 years ago

The proposed above approach requires OM reference to be accessible in Tcp Listener.

akonradi commented 3 years ago

which would recalculate overload action state on demand and post it to all interested callbacks on worker threads

If this is the only way that workers can learn that they shouldn't make new downstream connections (and if there's another way, I missed it), that's not going to be fast enough. If we want to absolutely make sure of anything, we need a counter that is incremented and read atomically. Otherwise there's still the possibility of a race: suppose worker A calls updateResource(), incrementing the "envoy.overload.downstream_connection_count" counter to the configured max. This posts an update event to set all workers' thread-local state for the connection count. Worker B, already in the middle of processing a callback, checks its thread-local state, sees that the count is below the limit, and calls the same updateResource.

Having a thread-local cached copy of these atomic counters sounds like a good way to get useful information in cases where we don't need perfect consistency, like if we wanted to reduce timeouts as the number of connections increased, but if there's a hard cap we want to honor, we need to pick some point along the line between "single shared atomic counter" and "allocate each worker its own quota" (with a midpoint being "workers pull from the shared counter into their local bucket then allocate from there").

nit: updateResource should take something other than a string, or should be called on some kind of handle object. String comparison (including as part of lookup or case matching) is unnecessarily expensive and we should avoid it as much as possible on the data path, including when establishing connections. Also, updateResource() needs to take a signed integer to be able to signal when a connection is closed, right?

antoniovicente commented 3 years ago

I think that the name "reactive" threw me off. I think that the check that you're proposing is a "proactive" check rather than a reactive mechanism like the current reactive check that operates based on the most recently propagated overload state information which is computed periodically.

Regarding interface, I think something like bool tryAllocateResource() which atomically increments the number of active resource users if resource usage is below the limit would do the trick. This method would return true if allocation succeeded, else return false. The resource tracker would also need a releaseResource() method that can be used to decrement usage.

Ideally you'ld query the overload manager for the proactive resource tracker during startup as part of the process of creating listeners and keep a reference to it in the listener so each update to the resource tracker does not require a map lookup by name.

nezdolik commented 3 years ago

@akonradi i may have not provided enough details on suggested approach. Worker threads will not be aware of global max+global current counters, they only maintain local counters (eg per listener) and thread local state for overload action. Atomic global counter is stored in resource monitor in OM on main thread. Upon trying to accept new connection worker thread contacts OM with "increment", OM in turn propagates that increment to resource monitor which atomically increments the global counter. Resource monitor triggers recalculation of overload actions (tied to that resource) and then OM propagates new state of overload action to worker threads. This approach is slower compared to doing inline check that returns boolean (what @antoniovicente mentioned), as OM needs to trigger all related callbacks and update overload action state for them. On the other hand it aligns more with existing OM model&approach of OM client code registering for actions. @antoniovicente thank you for looking at PR, i did not want to link it until we agree on approach. And yes, "proactive" term suits better :) From past discussions, it felt like we are optimising for speed (of checks), so going with boolean checks+atomic counters in OM sounds like a way to go. I will update PR according to our discussions.

nezdolik commented 1 year ago

@KBaichoo this issue is not done yet. We are 60-70% done with one of the suggested monitors. Need to plug downstream conns monitor into tcp listener and deprecate existing mechanism that tracks downstream conns. The remaining monitors are 0% done (active requests and upstream connections).

KBaichoo commented 1 year ago

Whoops sorry, this autoclosed when merging the PR. Reopen, thanks!

nezdolik commented 1 year ago

@kyessenov could you please reopen this issue? This is an umbrella ticket for multiple tasks in overload manager.

nezdolik commented 5 months ago

@KBaichoo @botengyao would it be beneficial to introduce resource monitor (3) from original suggestion to track & limit the number of sockets used for the upstreams? Is sounds useful to me.

KBaichoo commented 5 months ago

If you have a strong use case it could make sense, but I'm a bit more skeptical on the value of limiting the number of upstream connections globally given It is not entirely controlled by an attacker e.g. we'd have some connection reuse for various streams.

envoyproxy / envoy

overload manager: overload signals based on number of downstream connections and active requests #12419