separate ConnectionPool from Cluster definition

envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy

https://www.envoyproxy.io

Apache License 2.0

24.89k stars 4.79k forks source link

separate ConnectionPool from Cluster definition #11071

Open stevenzzzz opened 4 years ago

stevenzzzz commented 4 years ago

sometimes we deploy multiple services among a set of endpoints, we define a set of clusters on these machines, each has a pool connecting to the same set of endpoints.

Separating ConnectionPool from Cluster has many advantages:

The same set of backends may be used to serve multiple services, they can share the same connections, better performance, less RAM footprint, less pool warming.
different LB algorithms on same set of connections,
possibly introduce context awareness between services/clusters when their backends are the same.

stevenzzzz commented 4 years ago

/cc @antoniovicente

antoniovicente commented 4 years ago

An explicit goal would be to be able to share connections across clusters that share a ConnectionPool. Different LB algorithms may not be required. In fact, keeping a single copy of the LB data structures would reduce memory usage in cases you have large numbers of clusters that share a ConnectionPool.

mattklein123 commented 4 years ago

See also https://github.com/envoyproxy/envoy/issues/8702. Depending on how this is implemented it would be nice to also have a configuration/implementation in which we handle upstream connections in a true thread pool that is also shared between workers. This would also allow users to trade-off some additional CPU/contention for lower connection counts and memory usage in certain deployments. Let's chat if someone is going to work on this.

stevenzzzz commented 4 years ago

/cc chaoqin-li1123

antoniovicente commented 3 years ago

I think that sharing LB structures between clusters is an explicit goal of this effort. #8702 would provide additional reductions in resource usage, but is no replacement for the changes requested in this issue.

lambdai commented 3 years ago

Ideally different clusters would use different connection attributes e.g socket option, tls context. That means even though the endpoint sets are the same, the connection pools should not be shared.

I do agree there are fields vary among the clusters but connection pools can be shared. What is the major pain? LB?

jnt0r commented 2 months ago

Is this still active? I stumbled apon this as we would love to see this. We have many clusters (200-500) which resolve to the same set of endpoints therefore we open separate connections and connection pools for each cluster. To reduce all kind of ressource usage we would like to reuse the connections to the endpoints for multiple clusters. We only need different load balancing per cluster, as we need the clusters only for different circuit breaking behaviour.

stevenzzzz commented 2 months ago

ahh, I think it's still "alive". I saw you asked in a related issue (https://github.com/envoyproxy/envoy/issues/8702#issuecomment-2252536576) as well. :P

This feature would be really nice, but the blockers are always: system complexity, and folks' cycles.

OTOH, when there is no such feature, and you have a real issue in prod. You could possibly dance around Envoy config protos to make your envoy cluster a "MT" cluster: route all the traffic to the same backend group, but differentiate the traffic for previously "different cluster" using some header, path, authority etc.

jnt0r commented 2 months ago

Hi @stevenzzzz , thanks for your reply. What do you mean with "MT" cluster? Do you have a link or documentation?

stevenzzzz commented 2 months ago

nah, just some wild thoughts. there are two dimensions in this story, right?

No. of clusters.
No. of workers.
1. can't be changes eaisly, then per-cluster connection pools of "many clusters" becomes a problem. Instead of "many clusters" (say cluster_[a,b,c...z] you could pack them into one cluster (say cluster_A), and on your server/backends side, differentiate the traffic toward cluster_a, clusterb, ... using :host, :path or some internal header you have. (basically add a "virtual proxy layer" before the cluster[a...z]. Now this could possibly solve your scalability issue, but may not be applicable to your product, depends on how much control you have of the envoy.