envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.28k stars 4.69k forks source link

Cluster-level hash policy for sticky routing #23060

Open agrawroh opened 1 year ago

agrawroh commented 1 year ago

Description

Currently, the hashing policy is defined on the per-route basis [Link] using hash_policy.

We have a use-case where we want to do sticky routing for all the incoming traffic for the external ExtAuthZ and RateLimit services but, there is no good way to achieve it.

We can benefit a lot from a consistent hash Load Balancing like Ring Hash and Maglev by hashing on one of the HTTP headers to achieve consistent hashing and leverage the in-memory cache we have per-replica in these upstream services.

Is it possible to support LB hashing policy on the cluster-level (for all the routes)?

htuch commented 1 year ago

Yeah, there is support in the AsyncClient interface (via RequestOptions), but not configurable in a uniform way for things like ext_authz. One option would be to add this to the GrpcService.EnvoyGrpc config. This would avoid any major changes such as having to mix routing logic with ClusterManager. Does this work?

If not, I think the idea of having some cluster-wide control might have merit, but is a deeper discussion that would require @envoyproxy/api-shepherds and @mattklein123 to weigh in.

agrawroh commented 1 year ago

@htuch Thanks for chiming in. There was a similar request for mirroring traffic. Is it possible to identify some of the things that we currently only have on routes which would also make sense to be on the cluster-level and then think more on what would be the best place?

htuch commented 1 year ago

Yeah, there are others, e.g. fault injection. One thing I can offer here is a workaround - you can loopback the ext_authz cluster through a listener bound to localhost and have that apply a standard route table before hitting the real backend cluster. This is a total kludge, but if it helps your use case it might be worth considering.

I think we should leave this issue open to gauge wider interest. The API design here would require some careful thought.

agrawroh commented 1 year ago

Thanks, @htuch! That's exactly what we are doing right now for mirroring & splitting the traffic :)

One more question if you know the answer on top of your head - If we have hash_policy defined on the routes and the clusters to which the traffic is being mirrored/split have one of RING_HASH or MAGLEV LB Policy then would it do sticky routing? Or would it ignore the hash_policy and the split will be completely random?

htuch commented 1 year ago

In both cases Envoy is using an independent per-cluster HTTP async client with its own pseudo-config, so I strongly suspect the answer is it will ignore the original route hash policy.

phamann commented 2 days ago

Just leaving a note here that we have the EXACT same use-case as @agrawroh we'd like to consistenly hash based on a request header at the cluster level for a cluster we use as our ext_authz upstream to improve the cache efficency of an in-memory cache in the upstream nodes.

@agrawroh have you since found any other configurations to work, other than a whole new deadicated listener and filter chain as suggested on this thread?