:memo: Fix/Add High Availability documentation for Kubernetes and etcd

hyperbolic2346 commented 1 year ago

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Search before asking

[X] I have searched in the issues and found no similar issues.

What type of changes will we make to the documentation?

Usage of New Feature

Affects Version(s)

master

Improving the documentation

I was looking into how to deploy Kyuubi with HA via helm chart and on the surface it looks like the solution is to deploy zookeeper and configure that. Upon further digging, I found options for etcd and work such as #1392 that seems to indicate that HA is possible via etcd in a Kubernetes deployment. It seems the documentation is not up to date with the abilities the Kyuubi provides. I don't know what would be required to support HA in Kubernetes as the deployment has 2 replicas, but I don't see anything in the configmap to indicate HA. Is Kyuubi coordinating this itself somehow or would one need to point Kyuubi to either the k8s etcd or an external cluster similar to a zookeeper setup?

Are you willing to submit PR?

[X] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
[ ] No. I cannot submit a PR at this time.

github-actions[bot] commented 1 year ago

Hello @hyperbolic2346, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.

hddong commented 1 year ago

@hyperbolic2346 : K8s itself provide HA capabilities based on 2 replicas(Deployment). We can access kyuubi server pod through SVC(Such as: LoadBalancer, NodePort).

I don't know what would be required to support HA in Kubernetes as the deployment has 2 replicas, but I don't see anything in the configmap to indicate HA.

hyperbolic2346 commented 1 year ago

@hddong I know that k8s will give us load balancing and we can reach the service still if a pod fails by talking to the Kyuubi service, but is that truly HA? What I mean is what happens to existing connections if a pod fails? I understood that something like zookeeper was designed to store state necessary for recovery in those situations. Is that incorrect?

hyperbolic2346 commented 1 year ago

To keep this thread updated, I attempted to verify that multiple pods would provide HA, but only one of the pods knew about the session.

$ curl -v 127.0.0.1:10099/api/v1/sessions
*   Trying 127.0.0.1:10099...
* Connected to 127.0.0.1 (127.0.0.1) port 10099 (#0)
> GET /api/v1/sessions HTTP/1.1
> Host: 127.0.0.1:10099
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Thu, 01 Jun 2023 05:11:19 GMT
< Content-Type: application/json
< Content-Length: 3990
< Server: Jetty(9.4.50.v20221201)
< 
* Connection #0 to host 127.0.0.1 left intact
[{"identifier":"7eb813ea-014b-4b1c-a0c8-35f4064a0303","user":"anonymous","ipAddr":"192.168.7.32","conf":{"kyuubi.session.real.user":"anonymous","spark.kubernetes.namespace":"default","sessionType":"INTERACTIVE","kyuubi.session.name":"test_cluster"
...

and the other pod

$ curl -v 127.0.0.1:10099/api/v1/sessions
*   Trying 127.0.0.1:10099...
* Connected to 127.0.0.1 (127.0.0.1) port 10099 (#0)
> GET /api/v1/sessions HTTP/1.1
> Host: 127.0.0.1:10099
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Thu, 01 Jun 2023 05:11:36 GMT
< Content-Type: application/json
< Content-Length: 2
< Server: Jetty(9.4.50.v20221201)
< 
* Connection #0 to host 127.0.0.1 left intact

So it seem that zookeeper or now etcd, if I am reading the PR from @hddong correctly, is required for HA.

pan3793 commented 1 year ago

@hyperbolic2346 currently, only the batch session state is shared across kyuubi instances, the interactive session does not.

Zookeeper/etcd based HA is a client-side load balancing. It was explained in FAQ

How does Kyuubi HA work?

Basically, it's client-side load balancing mode. After the Kyuubi Server is started, it will register itself in Zookeeper, and the client can select one of the Kyuubi Servers to connect to.

hyperbolic2346 commented 1 year ago

Thanks for the pointer, @pan3793 . Am I correct to interpret that as saying that Kyuubi is currently unable to handle a pod failure seamlessly?

pan3793 commented 1 year ago

For interactive session, if a Pod crash, all session state will be lost.

In most cases, it can be overcome by client-side retry; but if a user runs

SET spark.sql.xxxx=xxxx;

SELECT ...; -- the configuration takes effect

-- Kyuubi Pod crashes
--   1. some editors like HUE will report errors to user, and guide user to recreate a new session
--   2. some clients automatically retry to create a new session without tips

SELECT ...; -- the configuration does NOT take effect

hyperbolic2346 commented 1 year ago

Reading the HA documentation again, I get the impression that zookeeper is only used to load balance between the kyuubi servers. Is that all the HA provided or is there session information stored in zookeeper that allows another instance to handle the client when a server is lost? In short, is zookeeper only a load balancer?

pan3793 commented 1 year ago

Is that all the HA provided or is there session information stored in zookeeper that allows another instance to handle the client when a server is lost?

No.

In short, is zookeeper only a load balancer?

For Kyuubi Server, yes. It is also used for engine discovery.

hyperbolic2346 commented 1 year ago

So in a Kubernetes deploy, zookeeper is simply unnecessary, correct?

pan3793 commented 1 year ago

No. You ignored my additional reply.

It is also used for engine discovery.

It's required for the engine registers itself after bootstrap, so that the Kyuubi Server knows how to connect it.

hyperbolic2346 commented 1 year ago

I apologize, my intent was not to ignore, but to learn. I don't understand the component makeup of Kyuubi and as such I don't know the difference between the engine and Kyuubi Server. I am a Kubernetes person attempting to deploy Kyuubi correctly and with my lack of Kyuubi or even Spark background I am having trouble understanding how to do that.

That is the intent of this bug, to decode the requirements and get them into the documentation to help others that are in the Kubernetes camp attempting to deploy this.

pan3793 commented 1 year ago

@hyperbolic2346 Sorry, I don't mean to blame anyone. Limited to my English skill, my words may be blunt.

Basically, Kyuubi is composed of Server and Computing Engine. The server is a daemon Java process, exposing Thrift/REST/etc. APIs.

When a connection from the client comes in, the Server will try to find a suitable Engine from ServiceDiscovery(Zookeeper or etcd), this is a kind of routing mechanism(the rules are based on your configuration, you can learn more from engine share level). If no candidate, it creates a new engine by running spark-submit(take Spark engine as an example) to launch a Spark Application, after the engine bootstrapped, it registers itself RPC server host:port with some additional information to the ServiceDiscovery, so that the KyuubiServer knows how to connect the engine.

So basically, the Kyuubi engine is created lazily, Zookeeper/etcd is required for engine discovery. After all connections are closed, the engine will self-terminate after the idle timeout to save resources.

The docs are written by contributors who are familiar with the Hadoop/Spark ecosystem, thus some general knowledge of the Hadoop/Spark ecosystem may be omitted.

We appreciate docs improvements from contributors who have different knowledge backgrounds :)

hyperbolic2346 commented 1 year ago

@hyperbolic2346 Sorry, I don't mean to blame anyone. Limited to my English skill, my words may be blunt.

I always assume such, please don't worry about it. I didn't take offense, I just want to be clear about my ignorance. I'm working on it, but aren't we all. Thank you very much for this detailed explanation. It is most helpful.

When a connection from the client comes in, the Server will try to find a suitable Engine from ServiceDiscovery(Zookeeper or etcd)

Ok, so we have Kyuubi listening on a port and the engine is the underlying Spark. Spark clusters(servers) are lazily created and registered in either the internal or external shared zookeeper. This tells me that if you don't have zookeeper running then only the Kyuubi instance that made the server would know about it. Does this mean it is also leaked if that Kyuubi instance is destroyed via OOM kill or node crashing?

This is coming together for me now I believe. I will read over these docs a few more times. The big question I have now is with regards to etcd. Does Kyuubi use the etcd from Kubernetes automatically or is it expected to either give that etcd address or spin up another etcd cluster?

The docs are written by contributors who are familiar with the Hadoop/Spark ecosystem, thus some general knowledge of the Hadoop/Spark ecosystem may be omitted.

We appreciate docs improvements from contributors who have different knowledge backgrounds :)

That is my goal here, to try and look at this from an outsider's perspective and get my thoughts down before I gain too much domain knowledge.

pan3793 commented 1 year ago

... This tells me that if you don't have zookeeper running then only the Kyuubi instance that made the server would know about it.

If you try starting Kyuubi in your local notebook(macOS or Linux), an embedded Zookeeper Server will be started inside the Kyuubi process if no external Zookeeper is configured, this can be observed via logs. In such cases, the Zookeeper is only used for Engine discovery. Note, this is only for testing, not for production usage.

Does this mean it is also leaked if that Kyuubi instance is destroyed via OOM kill or node crashing?

It was explained in my last reply. But in some corner cases, the Kyuubi Server will try to kill the Spark application (in K8s, invoking the K8s API server to delete Pod.)

(from my last reply) After all connections are closed, the engine will self-terminate after the idle timeout to save resources.

Does Kyuubi use the etcd from Kubernetes automatically or is it expected to either give that etcd address or spin up another etcd cluster?

It is expected to give that etcd address.

hyperbolic2346 commented 1 year ago

(from my last reply) After all connections are closed, the engine will self-terminate after the idle timeout to save resources.

Oh, the engine handles this idle timeout. Ok. I get that.

Does Kyuubi use the etcd from Kubernetes automatically or is it expected to either give that etcd address or spin up another etcd cluster?

It is expected to give that etcd address.

Ok, so in a Kubernetes production world you would hand Kyuubi the etcd address and let it handle service discovery there. The expectation is not to spin up a zookeeper for it.

Thank you again for all your help.

tgravescs commented 1 year ago

so the above client service discovery load balancing makes sense on k8s for JDBC, but I guess for the new REST API added where you have a notebook running and sending operations to a running session doesn't work, correct? That session is only running on one of the Kyuubi servers so if using the k8s load balancer, its just going to pick one of the kyuubi server and it might not be the one with your session on it. I assume there is nothing built into Kyuubi for this?

so only option would be external load balancer somehow knowing how to route them.

apache / kyuubi