Request for `AsyncLoadBalancer` or a built-in Least Connection Load Balancing Implementation

ChieloNewctle commented 1 month ago

What is the problem your feature solves, or the need it fulfills?

When implementing least connection load balancing, it's necessary to have internal mutability to record connections, which can be achieved using RwLock or similar constructs.

However, the functions of BackendSelection and BackendIter are defined as non-async, limiting the implementations to wait for locks without utilizing async runtimes.

Describe the solution you'd like

It would be really helpful to have AsyncLoadBalancer or a built-in least connection load balancing implementation.

Describe alternatives you've considered

To work around this limitation, here are some alternatives:

Use std::sync::RwLock.
Using futures::executor::block_on to wait for tokio::sync::RwLock in non-async functions.
Implementing a customized MyLoadBalancer according to the current LoadBalancer in pingora.

eaufavor commented 1 month ago

Could you tell us more about the use case for async lock? Async lock is only needed if the logic needs to hold the lock across .await.

std::sync::RwLock and atomic variables should be enough to trace the counts of connections.

Reference: https://tokio.rs/tokio/tutorial/shared-state

ChieloNewctle commented 1 month ago

Could you tell us more about the use case for async lock? Async lock is only needed if the logic needs to hold the lock across .await.

std::sync::RwLock and atomic variables should be enough to trace the counts of connections.

Reference: https://tokio.rs/tokio/tutorial/shared-state

Well, async locks are not necessary to be needed when held across .await.

Using async locks will make only one running async task processing with the lock. And other tasks that are .awaiting the lock won't block the threads of the async task executor.

While using std::sync::RwLock and other similar sync locks, tasks that are waiting the lock will actually block the threads of the async task executor.

For example, calling std::sync::RwLock::read will lead to the thread being blocked:

The calling thread will be blocked until there are no more writers which hold the lock. There may be other readers currently inside the lock when this method returns. This method does not provide any guarantees with respect to the ordering of whether contentious readers or writers will acquire the lock first.

As for atomic methods, it's true we can use persistent data structures to utilize ArcSwap.

The use case is fairly simple: balancing the load to the least connection upstream. It will be great to have a built-in least connection load balancing implementation.

As for other use cases, rather than using AsyncLoadBalancer, implementing customized LoadBalancers without using BackendSelection trait may be a better choice.

eaufavor commented 1 month ago

Note that from https://tokio.rs/tokio/tutorial/shared-state:

As a rule of thumb, using a synchronous mutex from within asynchronous code is fine as long as contention remains low and the lock is not held across calls to .await.

In practice we have no problem using sync locks following that guidance.

Regarding least connection load balancing, I think that it is possible to use ArcSwap<[AtomicUsize; SIZE]> to track all the connections without locking being involved at all.

Meanwhile we do acknowledge that the need for a least connection load balancing algorithm to be implemented.

ChieloNewctle commented 1 month ago

Note that from https://tokio.rs/tokio/tutorial/shared-state:

As a rule of thumb, using a synchronous mutex from within asynchronous code is fine as long as contention remains low and the lock is not held across calls to .await.

In practice we have no problem using sync locks following that guidance.

Regarding least connection load balancing, I think that it is possible to use ArcSwap<[AtomicUsize; SIZE]> to track all the connections without locking being involved at all.

Meanwhile we do acknowledge that the need for a least connection load balancing algorithm to be implemented.

Using a synchronous mutex from within asynchronous code is fine when talking about deadlocks and where contention remains low. For performance, I would recommend benchmarking under several different senario.

And yes, in the cases where the upstreams are static, ArcSwap<[AtomicUsize; SIZE]> is good enough. But there are some senario where upstreams are dynamic.

Hope to have least connection load balancing in pingora soon. I'll close this issue.

cloudflare / pingora