envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.85k stars 4.78k forks source link

How can I make a custom load balancer policy? #32318

Closed nikhilro closed 5 months ago

nikhilro commented 8 months ago

Hey there,

I'm looking to make a custom load balancer policy. I want to route the request to an available machine. Each request is a long-lived task and none of the built-in policies work for our exact use case.

I currently have a node script that queries for an available worker and then proxies to that IP. Would like to bring this "discovery" logic to Envoy.

I tried to look for examples in source/extensions/load_balancing_policies but all they seem dense. Eg. the random one links to impl outside the folder. I really just want to customize the chooseHost function for the request forwarding decision.

Would appreciate pointers on exactly what's needed!

P.S. I know I could potentially use health checks to hide some workers from Envoy but that doesn't work for another reason that's specific to us. I unfortunately do need a custom load balancer policy.

Thanks, Nikhil

kyessenov commented 8 months ago

I think you have two possible designs here:

  1. Have a "cluster" or a pool of workers managed via CDS and EDS, that determine the lifecycle of the upstream machines. You can look at the stateful sessions HTTP filter on how to override the upstream host selection via a downstream filter, that can have your custom logic.

  2. Fully unmanaged upstream. You can look at "original_dst" upstream load balancer that simply doesn't load balance. A downstream filter can explicitly set the upstream destination. You have to rely on "the wire" means to expire existing connections, e.g. via max age, or idle timeouts.

CC @tonya11en

tonya11en commented 8 months ago

It's unclear what you actually need the host selection behavior to be. Can you elaborate on the specifics here, so we can understand why the existing LB policies won't work?

If the existing policies truly don't work, I'll endorse option 2 above.

holooooo commented 7 months ago

Maybe you can add one envoy instance before your backend as a loadbalancer. And make this loadbalancer using least request loadbalancing? Then when your requests pass thought the loadbalancer, they will be sent to the idlest backend

nikhilro commented 7 months ago

Happy to elaborate. I ended up finding a bandaid solution for now, hence the delayed response. Would still like to use Envoy though.

We make voicebots that people can put behind phone numbers. For a new call, Twilio sends us a websocket request and we direct that call to one of the available machines. There is a strict constraint where each machine can only handle 1 call, it needs 1 GPU. We've thought about making the whole system stateless but unfortunately that adds latencies that we can't afford. So, for now, that one call is pinned to that one machine and that machine should reject any new calls.

I would like to use Envoy to do the routing on incoming websocket connections to one of the available machines. The way this discovery happens right now is, I put the incoming request's uuid (generated) on a queue, one of the machine picks it up, announces to the load-balancer it will take it and load-balancer starts proxing. Would like to embed this logic into Envoy.

Does that make sense?

nikhilro commented 7 months ago

Quick bump if anyone has thoughts :)

tonya11en commented 7 months ago

We make voicebots that people can put behind phone numbers. For a new call, Twilio sends us a websocket request and we direct that call to one of the available machines. There is a strict constraint where each machine can only handle 1 call, it needs 1 GPU. We've thought about making the whole system stateless but unfortunately that adds latencies that we can't afford. So, for now, that one call is pinned to that one machine and that machine should reject any new calls.

I'm not sure that a custom LB policy is the place to do this. Load balancing algorithms generally assume that there are multiple load balancers forwarding traffic to the backends and aren't really cut out for this kind of state tracking you are trying to do.

It sounds like a custom filter is that way to go for what you're trying to do.

nikhilro commented 7 months ago

I'm not expecting LB to track state. I'm just expecting to pass it an IP for each websocket request that comes in. The state tracking will live outside of the LB.

Could you link me to custom filter doc? Nothing immediately came up and I was guided to something that felt like filtering incoming requests.

tonya11en commented 7 months ago

@nikhilro here are some links worth investigating.

The existing stateful session filter, which allows you to define custom extensions. This may provide all the functionality you need: https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/stateful_session_filter

The Envoy filter example repo, which has basic examples of custom native L4 and L7 filters: https://github.com/envoyproxy/envoy-filter-example

nikhilro commented 7 months ago

Thanks team, will take a look. It seems hard enough that not immediate roadmap but soon--our shitty implementation with node http-proxy will work for next 3 weeks. A lot more support than I would've expected for a free open-source project 😄

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] commented 5 months ago

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.