Open tonya11en opened 3 years ago
Immediate thoughts from me is that this sounds super interesting and potentially very useful! I can imagine this being great as an optional feature as long as we can implement it in such a way that it doesn't increase the resource overhead when it's not used.
One thought was that it seems to be that we're assuming that high RPS = misbehaving client that should be throttled, but I can imagine scenarios where this isn't true: imagine a service hosting read/write endpoints, with some clients making a large amount of reads very quickly, and others making a small amount of write calls. If the writes start slowing down the data store, you would then see overall latency increase and start throttling the reads which would result in throttling the wrong traffic. That said even if this is true I think worst case it means that this kind of throttling might be applicable for all kinds of traffic patterns.
One thought was that it seems to be that we're assuming that high RPS = misbehaving client that should be throttled, but I can imagine scenarios where this isn't true: imagine a service hosting read/write endpoints, with some clients making a large amount of reads very quickly, and others making a small amount of write calls. If the writes start slowing down the data store, you would then see overall latency increase and start throttling the reads which would result in throttling the wrong traffic. That said even if this is true I think worst case it means that this kind of throttling might be applicable for all kinds of traffic patterns.
I can actually just hack this up and run the same tests. I'm actually having to deal with this in my day-to-day, so its definitely a common issue. Stand by for more graphs.
Agreed this sounds interesting and worth pursuing. Do you envision this as a new filter or built into one of the existing filters like adaptive concurrency or admission control?
Another question I have is around tenant definition. In the simplest for I guess we could call each connection a tenant, but it seems like we might want a more complex/configurable notion of this?
Do you envision this as a new filter or built into one of the existing filters like adaptive concurrency or admission control?
I imagined this as a change to the active request circuit breakers. To keep current default behavior, there would just be a queue timeout of 0ms. If that's not a good idea, then it should be its own filter to start. I'm not sure what kind of effect this would have on that admission control/adaptive concurrency control loops at the moment.
Another question I have is around tenant definition. In the simplest for I guess we could call each connection a tenant, but it seems like we might want a more complex/configurable notion of this?
The way I defined tenants in the experiments was via a header in each request, since that allows for more advanced isolation domains. However, that would rely on the downstreams to not exploiting the headers. It certainly be more secure if the isolation domains were derived on each Envoy or somehow discovered via the control plane.
any udpate on this , anyone pursuing this
@two10 not at the moment. If you'd like to take this on, I'm happy to help or offer guidance.
While messing around with different load shedding mechanisms over the holidays, I hacked one up that may be used to perform fair load shedding (among other things) via the current circuit breakers. I'd love to start some discussion here around its viability.
TL; DR: Altering the concurrency control mechanisms (circuit breakers and adaptive concurrency) to allow for queuing of excess requests, rather than immediately shedding, can allow for multi-tenant load shedding support. This can mitigate the effect of "noisy neighbors" and provide isolation between arbitrary downstream tenants.
Problem: Indiscriminate load shedding
Consider a scenario with 2 downstream tenants sending traffic to some single server fronted by an Envoy sidecar. If the Envoy has a well configured circuit breaker or adaptive concurrency configured, it should be more-or-less protected from resource exhaustion because the concurrency control mechanisms will begin shedding load when a threshold is reached. If one of the downstream tenants is sending the vast majority of the traffic, it's responsible for triggering the load shedding, but both tenants will have the same percentage of their requests 503'd. This is a canonical noisy neighbor problem and to my knowledge, the current concurrency control mechanisms don't have any mitigations for this beyond the circuit breaker's priority config parameter.
Queuing/buffering requests instead of dropping
One extension to the concurrency control mechanisms that may mitigate this problem is to queue requests instead of immediately dropping them. This would allow us to employ various active queue management techniques, while still being able to maintain the current circuit breaker behavior if one prefers it.
I prototyped the idea via a few different simulations I hacked up, similar to the ones used to evaluate adaptive concurrency. There were 2 different clients sending traffic to a single server. One client had a steady, low request rate. The other steadily ramped up traffic until the concurrency control mechanisms were engaged.
The queue management technique used was a simple queue timeout. If a request sat queued beyond some threshold, it was dropped. This allows for a recreation of the current circuit breaker behavior by immediately timing out a request upon insertion into the queue, which results in immediately shedding a request instead of performing any queuing.
On the other hand, one can simply never time out a request in the queue, which would result in elevated latencies and timeouts caused by queuing delay, mirroring the absence of any circuit breakers from the downstream perspective and resulting in a congestion collapse and all requests timing out:
Notice above that once all of the requests hit some latency threshold, both downstream tenants got nothing but timeouts and the goodput dropped to zero. The goodput in this case is a request that received a reply from the upstream that was not a 503 and didn't time out.
A more reasonable approach here would be to set the queue timeouts to some fraction of the request timeouts. Now you have a circuit breaker that is more tolerant of bursts and an upper-bound on the allowed queuing delay to prevent latencies from getting out of hand:
Notice we still get all the greatest hits of circuit breaking: reasonable latencies and nonzero goodput! The circuit breaker is doing its job and protecting the server's goodput; however, the only reason any load shedding needs to occur is because only one of tenants. Both tenants are receiving 503s which is a bit... unfair.
Tenant isolation via fair queues
We can address the unfairness in load shedding observed above by offering more interesting queuing disciplines in the circuit breaker. If we swap out the FIFO queue used earlier with a classful fair queue, we now have isolation between the different tenants/classes. Here's the last example in the previous section, WITH queue timeouts, but with basic fair queuing between the tenants:
The throttling via 503 is occurring on ONLY misbehaving (orange) tenant! The other tenant is totally unaffected and isolated.
This even applies to the case without the queue timeouts. The only tenant experiencing the elevated latencies, timeouts, and goodput collapse is the one who is "misbehaving". The tenant in blue doesn't even notice anything because those elevated latencies are coming from the queuing delays:
It's also worth mentioning this approach isn't limited to fair queuing. Circuit breaker priorities can be implemented by using a priority queue, a weighted fair queue can be used to partition resources differently, or CoDel (or some other variant) can be used to keep latencies under control.
Conclusion
This seems promising to me based on the simulations I've hacked up, but I'm curious if anyone has thoughts on this approach to isolated circuit breaking. I think there may even be other features that we'd be able to get by simply using a fancier queuing discipline and exposing the relevant knobs.
Let me know what you all think!