etcdv3 / etcd-client

An etcd v3 API client
Apache License 2.0
210 stars 50 forks source link

Tolerate (partial) connection failures in endpoints in the Balancer Client #39

Open rrichardson opened 2 years ago

rrichardson commented 2 years ago

Motivation:

We connect to a quorum of etcd servers across regions (not the recommended architecture, but it works quite well)

For various reasons, a small subset of the nodes might be unavailable.
This should instead tolerate failures and adjust the pool accordingly, if that is the desire of the consumer of the API.

This functionality lives in the tower::balancer and tonic::transport::service behavior. The discovery mechanism in balancer_channel connects "lazily" upon receiving its requests. It appears to connect to all endpoints, but if one fails, the entire operation fails.

It seems like the only option here is to work with the Tower team to provide a partial success route. This is preferred not only because it is the right thing for initial connection, but should provide the proper behavior in an ongoing fashion.

I will continue to pursue this approach, but I'd like to leave this ticket open because there will likely be some (hopefully non-breaking) changes to the etcd client to optionally utilize the partial-success behavior.

sylzd commented 6 months ago

ditto~ And it's an important issue I think.