freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
548 stars 150 forks source link

Analyze ES load balancer #2909

Closed mlissner closed 1 year ago

mlissner commented 1 year ago

(Lifted from https://github.com/freelawproject/courtlistener/issues/2867#issuecomment-1636640313.)

Load Balancer might no be required.

I tought a load balancer might be required as documentation suggested it:

You should also avoid sending client requests to just one of the other two nodes. If you do, and this node fails, then any requests will not receive responses, even if the remaining nodes form a healthy cluster. Ideally, you should balance your client requests across both of the non-tiebreaker nodes. You can do this by specifying the address of both nodes when configuring your client to connect to your cluster. Alternatively, you can use a resilient load balancer to balance client requests across the appropriate nodes in your cluster. The Elastic Cloud service provides such a load balancer.

However, a load balancer might only be useful if the Elasticsearch client is unable to send requests to all the nodes in the cluster. If the client is tied to a node and this node fails, the cluster will stop receiving requests.

If the client can send requests to multiple nodes and handle retries and timeouts, an external load balancer might not be necessary, at least from the Elasticsearch perspective. The Python Elasticsearch library supports this: https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/config.html#node-pool

By default, it chooses a node from the pool using a round_robin approach, and it can also handle retries and timeouts.

Therefore, if a node fails, the request can be retried and sent to a different node.

Internally, the Elasticsearch node that receives the request acts as a load balancer (each node is a coordinator), routing requests to the best nodes that can retrieve results.

mlissner commented 1 year ago

This is interesting. AWS load balancers are pretty easy to set up and pretty cheap, so I'm not too opposed to doing them.

I think we could get some nice features from them, like:

mlissner commented 1 year ago

If we're using k8s, it comes with load balancers, so that's that. Great.