ginepro does not load 'balance'

conradludgate commented 1 year ago

Bug description

ginepro indirectly uses tower::Balance which makes use a best-of-two random load balancing strategy.

Unfortunately, tonic has no built in mechanism for determining load, so this is hard coded to be 0 always.

https://docs.rs/tonic/0.8.3/src/tonic/transport/service/connection.rs.html#114-120

This means we have randomise distribution. This is still a balancing technique but is known to not be very good

conradludgate commented 1 year ago

We don't need to use tonic's Channel abstraction since Grpc client makes use of tower services directly. That let's us implement this logic ourselves.

Now for how we communicate load. An idea I had is to have a well-known header in responses for how much load the server is currently experiencing.

This can be cached by the client. In high load situations this will be accurate. In low load situations it will be less accurate but arguably less important.

We would need to decide on a good header for this.

An alternative is to have an alternative well known load look up RPC. We can call that on a regular basis to update load values

If the load endpoint or header doesn't exist, we can stop checking for it and fallback on the original strategy of random or round robin. This can be configured dynamically

ThomWright commented 1 year ago

Have you seen this which seems to be what Linkerd use?

conradludgate commented 1 year ago

Using latency is clever, ~~but I'm not sure exactly how the pending calculation works. I'm probably missing something.~~

Ahh I see. It's not 'pending', it's 'in flight requests'.

So load=latency*in_flight.

I have no objections to this, since it makes it only a client side approach.

xmakro commented 1 year ago

It would be good to mention this in the docs. Currently ginepro doc says it's using using The Power of Two Choices, which is misleading.

TrueLayer / ginepro

ginepro does not load 'balance' #39

Bug description