Handing out topology information to CrateDB clients

amotl commented 3 years ago

Hi there,

at https://github.com/orchestracities/ngsi-timeseries-api/issues/452, we are having a nice discussion about how to properly populate the list of CrateDB database URIs to connect to when using the HTTP protocol.

I discussed that with @mfussenegger and @seut already and they told me that a round-robin like distribution mechanism is implemented in crate-python. However, the same thing can also be implemented by using some K8s ingress technologies or by just using a dedicated HAProxy (containerized or not) as a more sophisticated HTTP load balancer in order to apply more advanced balancing mechanisms [1].

So, I wanted to take that chance to bring up this topic here if you see a chance to also bring that functionality to crate-operator or an appropriate extension somehow or if that wouldn't align with the role of crate-operator at all.

I am just imagining something like whether crate-operator might be able to provide cluster topology information to clients (or HAProxy instances) in order to populate their list of database URIs to connect to. I have to admit that I don't know much about the scope of crate-operator yet but will be happy to learn more about it.

With kind regards, Andreas.

P.S.: This topic is obviously not only limited to load balancing HTTP connections. When aiming at the PostgreSQL interface, respective topology information might want to be used to seed PgBouncer and friends.

[1] http://cbonte.github.io/haproxy-dconv/2.3/configuration.html#4-balance

SStorm commented 3 years ago

Hi @amotl,

There is currently a k8s load balancer in front of all the CrateDB nodes, so round-robin is already happening in that sense - although it is L4 so would balance on the TCP connection level only. Are you imagining we do smarter routing on a per-request basis (i.e. maybe based on the URL)? HAProxy in front would be possible, but it's a very non-trivial thing to do (as we get the current LB for free from k8s, but would need to configure HAProxy ourselves). Do we see some scalability challenge in the future with this setup?

Regards, Roman

WalBeh commented 3 years ago

Not sure what we would gain from changing to HAProxy on k8s. Certainly it would be an option for on-premise setups.

Exposing each node by it own public IP makes us possibly bad neighbors on a cloud, as we would allocate a lot of IPs. Anyhow we would give up the health-check on the backend service.

SStorm commented 3 years ago

We could technically expose each node as a different port on the LB (mycluster.aks.westeurope.azure.cratedb.cloud:4200, mycluster.aks.westeurope.azure.cratedb.cloud:4201, mycluster.aks.westeurope.azure.cratedb.cloud:4202 etc). Of course that introduces its own can of worms with health checks as you said Walter. Also not clear how the clients would handle nodes temporarily disconnecting (i.e. rolling upgrade).

amotl commented 3 years ago

Hi Romanas and Walter,

thanks for your answers and thoughts about this. The issue has just been created here to add something to the topic @SBlechmann, @c0c0n3 and @chicco785 are discussing at https://github.com/orchestracities/ngsi-timeseries-api/issues/452.

As I am not very much into the details of efficiently operating a CrateDB cluster within a Kubernetes environment, I decided to reach out to you and ask for your opinion about this.

There is currently a k8s load balancer in front of all the CrateDB nodes, so round-robin is already happening in that sense - although it is L4 so would balance on the TCP connection level only.

This doesn't sound bad at all as it will probably also be able to handle both HTTP connections to port 4200 as well as PostgreSQL wire protocol connections to port 5432?

Are you imagining we do smarter routing on a per-request basis (i.e. maybe based on the URL)? Do we see some scalability challenge in the future with this setup?

No, not at all. We just shared some thoughts with @mfussenegger and @seut the other day and concluded that - if there would be demand for that - HAProxy would be able to apply more sophisticated balancing algorithms.

HAProxy in front would be possible, but it's a very non-trivial thing to do.

I completely understand that.

Exposing each node by it own public IP makes us possibly bad neighbors on a cloud, as we would allocate a lot of IPs.

I hear you. So, the conclusion to this is that everything balancing should already be handled by the clustering infrastructure and the CrateDB client will just communicate with a single endpoint, right?

With kind regards, Andreas.

amotl commented 3 years ago

So, the conclusion to this is that everything balancing should already be handled by the clustering infrastructure and the CrateDB client will just communicate with a single endpoint, right?

So, when imagining a CrateDB cluster comprised of nodes having different roles (e.g. read-only nodes vs. equally shared roles), the K8s load balancer in front of all the CrateDB nodes will have to be made aware where to distribute the requests to, right? That might be only a subset of all CrateDB nodes, right?

I am just curious about this topic: Will crate-operator have any responsibility on sharing this kind of topology information with the configuration needed to properly setup the K8s proxy (automatically) or is this something the human Kubernetes operator has to do?

SStorm commented 3 years ago

Hey Andreas,

So, when imagining a CrateDB cluster comprised of nodes having different roles (e.g. read-only nodes vs. equally shared roles), the K8s load balancer in front of all the CrateDB nodes will have to be made aware where to distribute the requests to, right? That might be only a subset of all CrateDB nodes, right?

Yes it would, but that's not really possible with the current setup, as we don't offer different kinds of nodes...

I am just curious about this topic: Will crate-operator have any responsibility on sharing this kind of topology information with the configuration needed to properly setup the K8s proxy (automatically) or is this something the human Kubernetes operator has to do?

The crate-operator itself only configures things in kubernetes, and doesn't do anything once the cluster is set up. All the operator does is when a new cratedb resource is created in k8s, it (asynchronously) creates all the other things required: the LB, the actual pods, sets up the required users, the backup cronjobs etc. So if we wanted to have a smarter LB, we would have to have the operator involved in creating it, yes.

This doesn't sound bad at all as it will probably also be able to handle both HTTP connections to port 4200 as well as PostgreSQL wire protocol connections to port 5432?

Correct. In fact it already does - all CrateDB k8s clusters are reachable on both ports using the same LB. You will get round-robingly assigned to one of the nodes for the duration of your connection.

I hear you. So, the conclusion to this is that everything balancing should already be handled by the clustering infrastructure and the CrateDB client will just communicate with a single endpoint, right?

I wasn't around when this was being built, but that's my understanding. I think there is a lot of merit in revisiting this in the future, especially perhaps if we can take it further and making the client itself aware of the topology of the cluster - i.e. I would think that writing directly to the node that has the primary shard for the date you're inserting would be more optimal?

Cheers, Roman

amotl commented 3 years ago

Hi Romanas,

thanks again for sharing more insights about this topic.

All the operator does is when a new CrateDB resource is created in k8s, it (asynchronously) creates all the other things required: the LB, the actual pods, sets up the required users, the backup cronjobs etc.

I believe that is perfect and exactly the very thing @SBlechmann and @c0c0n3 were discussing at https://github.com/orchestracities/ngsi-timeseries-api/issues/452#issuecomment-779861307 ff.

If I get you right, every necessary step is already automated and a CrateDB client will just have to connect to a single endpoint (the LB) in order to have its requested to be distributed amongst the cluster nodes. I believe that is all I wanted to primarily gain from this discussion.

Regarding my detours to "role-based" cluster nodes, where the cluster topology is more advanced, I completely understand that this is currently beyond the scope of crate-operator and might be revisited in the future.

With kind regards, Andreas.

c0c0n3 commented 3 years ago

The crate-operator itself only configures things in kubernetes, and doesn't do anything once the cluster is set up. All the operator does is when a new cratedb resource is created in k8s, it (asynchronously) creates all the other things required: the LB, the actual pods, sets up the required users, the backup cronjobs etc.

That's one heck of a job though :-) I can't wait to give crate-operator a try in our clusters. Not sure when, but hopefully in a not so distant future...

amotl commented 3 years ago

Hi @c0c0n3,

thanks for already recognizing the addendum I posted at https://github.com/orchestracities/ngsi-timeseries-api/issues/452#issuecomment-783360773. I just wanted to make clear that the crate-operator doesn't offer that convenience in all situations yet. Regarding LB autoprovisioning, it does this job only in IaaS environments at Azure and AWS for now.

With kind regards, Andreas.

crate / crate-operator

Handing out topology information to CrateDB clients #170