apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.11k stars 915 forks source link

[Bug] The engine selection is not supported when HA is enabled and kyuubi.engine.share.level is set to GROUP #6724

Closed A-little-bit-of-data closed 1 month ago

A-little-bit-of-data commented 1 month ago

Code of Conduct

Search before asking

Describe the bug

After configuring HA in kyuubi-defaults.conf and setting kyuubi.engine.share.level to GROUP, and setting a spark user,The entire kyuubi-defaults.conf configuration is as follows:

Helm chart provided Kyuubi configurations

#kyuubi.engine.type=SPARK_SQL, FLINK_SQL, CHAT, TRINO, HIVE_SQL, JDBC
#kyuubi.engine.type=TRINO
kyuubi.kubernetes.namespace=XXXX
kyuubi.frontend.connection.url.use.hostname=false
kyuubi.frontend.thrift.binary.bind.port=10009
kyuubi.frontend.thrift.http.bind.port=10010
kyuubi.frontend.rest.bind.port=10099
kyuubi.frontend.mysql.bind.port=3309
kyuubi.frontend.protocols=REST,THRIFT_BINARY
kyuubi.session.engine.check.interval=PT1M
kyuubi.session.engine.idle.timeout=PT2M
kyuubi.kubernetes.terminatedApplicationRetainPeriod=PT1M
kyuubi.engine.share.level=GROUP
kyuubi.authentication=JDBC
kyuubi.authentication.jdbc.driver.class = com.mysql.cj.jdbc.Driver
kyuubi.authentication.jdbc.url = jdbc:mysql://XXXX:3306/kyuubi_auth_db
kyuubi.authentication.jdbc.user = XXXX
kyuubi.authentication.jdbc.password = XXXX
kyuubi.authentication.jdbc.query = SELECT 1 FROM kyuubi_auth_db.kyuubi_user WHERE user=${user} AND password=MD5(CONCAT('kyuubi',${password}))

####HA
kyuubi.ha.addresses=XXXX:2181,XXXX:2181,XXXX:2181
kyuubi.ha.namespace=kyuubi
kyuubi.ha.zookeeper.session.timeout=600000

Kyuubi Metrics

kyuubi.metrics.enabled=true
kyuubi.metrics.reporters=

##trino
kyuubi.session.engine.trino.connection.url=http://trino.XXXX:8080
kyuubi.session.engine.trino.connection.catalog=xxxx

##spark
kyuubi.kubernetes.spark.cleanupTerminatedDriverPod.kind=ALL

## User provided Kyuubi configurations
___batch3___.spark.app.name=batch3
___batch3___.spark.executor.instances=3
___batch3___.spark.driver.cores=1
___batch3___.spark.executor.cores=5
___batch3___.spark.kubernetes.driver.limit.cores=1
___batch3___.spark.kubernetes.executor.limit.cores=5
___batch3___.spark.driver.memory=1g
___batch3___.spark.executor.memory=20g

Then connect through zk, /opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:2181/;kyuubi.engine.type=SPARK_SQL;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi' -n batch3 -p

/opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:2181/;kyuubi.engine.type=TRINO;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi' -n evm -p

All of the above will be started through the spark SQL engine. Since the evm-related spark configuration is not configured, the startup will report an error:

Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://kubernetes.default.svc:443/api/v1/namespaces/XXXX/pods. Message: Pod "kyuubi-group-spark-sql-evm-default-4749a463-d023-40a0-9f29-c2a20294f86d-driver" is invalid: spec.containers[0].resources.requests: Invalid value: "1": must be less than or equal to cpu limit. Received status: Status(apiVersion=v1, code=422, details =StatusDetails(causes=[StatusCause(field=spec.containers[0].resources.requests, message=Invalid value: "1": must be less than or equal to cpu limit, reason=FieldValueInvalid, additionalProperties={})], group=null, kind=Pod, name=kyuubi-group-spark-sql-evm-default-4749a463-d023-40a0-9f29-c2a20294f86d-driver, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Pod "kyuubi-group-spark-sql-evm-default-4749a 463-d023-40a0-9f29-c2a20294f86d-driver" is invalid: spec.containers[0].resources.requests: Invalid value: "1": must be less than or equal to cpu limit, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid , status=Failure, additionalProperties={}).

The following is the query in zk after executing /opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:2181/;kyuubi.engine.type=TRINO;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi' -n evm -p. It is obvious that kyuubi.engine.type=TRINO is not effective. image

However, through the kyuubi 10009 port connection, /opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:10009/?kyuubi.engine.type=SPARK_SQL' -n batch3 -p

/opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:10009/?kyuubi.engine.type=TRINO' -n evm -p This way, you can start according to the specified engine, and from the log, you can see that the connection is initialized from zk

Start the spark SQL engine:

LaunchEngine
2024-10-15 09:03:21.573 INFO KyuubiSessionManager-exec-pool: Thread-451 org.apache.kyuubi.shaded.curator.framework.imps.CuratorFrameworkImpl: Starting 
2024-10-15 09:03:21.573 INFO KyuubiSessionManager-exec-pool: Thread-451 org.apache.kyuubi.shaded.zookeeper.ZooKeeper: Initiating client connection, connectString=XXXX:2181,XXXX:2181,XXXX:2181 sessionTimeout=600000 watcher=org.apache.ky uubi.shaded.curator.ConnectionState@43c5f42e 
2024-10-15 09:03:21.575 INFO KyuubiSessionManager-exec-pool: Thread-451-SendThread(XXXX:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Opening socket connection to server XXXX/XXXX:2181 .Will not attempt to authenticate using SASL (unknown error) 
2024-10-15 09:03:21.579 INFO KyuubiSessionManager-exec-pool: Thread-451-SendThread(XXXX:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Socket connection established to XXXX/XXXX:2181, initiating session 
2024-10-15 09:03:21.599 INFO KyuubiSessionManager-exec-pool: Thread-451-SendThread(XXXX:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Session establishment complete on server XXXX/XXXX:2181, sessionid = 0x2a000000874400ae, negotiated timeout = 40000 
2024-10-15 09:03:21.621 WARN KyuubiSessionManager-exec-pool: Thread-451 org.apache.kyuubi.session.HadoopGroupProvider: There is no group for batch3, use the client user name as group directly 
2024-10-15 09:03:21.667 INFO Kyuubi SessionManager-exec-pool: Thread-451 org.apache.kyuubi.engine.ProcBuilder: Creating batch3's working directory at /opt/kyuubi/work/batch3 
2024-10-15 09:03:21.677 INFO KyuubiSessionManager-exec-pool: Thread-451 org.apache.kyuubi.Utils: Loading Kyuubi properties from /opt/spark/conf/spark-defaults.conf 
2024-10-15 09:03:21.701 INFO KyuubiSessionManager-exec-pool: Thread-451 org.apache.kyuubi.engine.ProcBuilder: Logging to /opt/kyuubi/work/batch3/kyuubi-spark-sql-engine.log.0 2024- 10-15 09:03:21.711 INFO KyuubiSessionManager-exec-pool: Thread-451 org.apache.kyuubi.engine.EngineRef: Launching engine: /opt/kyuubi/externals/spark-3.5.1-bin-hadoop3/bin/spark-submit \ 

Starting the trino engine: LaunchEngine 
2024-10-15 09:03:00.289 INFO KyuubiSessionManager-exec-pool: Thread-472 org.apache.kyuubi.shaded.curator.framework.imps.CuratorFrameworkImpl: Starting 
2024-10-15 09:03:00.289 INFO KyuubiSessionManager-exec-pool: Thread-472 org.apache.kyuubi .shaded.zookeeper.ZooKeeper: Initiating client connection, connectString=XXXX:2181,XXXX:2181,XXXX:2181 sessionTimeout=600000 watcher=org.apache.kyuubi.shaded.curator.ConnectionState@73268d32 
2024-10-15 09:03:00.298 WARN KyuubiSessionManager-exec-pool: Thread-472 org.apache.kyuubi.session.HadoopGroupProvider: There is no group for evm, use the client user name as group directly 
2024-10-15 09:03:00.318 INFO KyuubiSessionManager-exec-pool: Thread-472-SendThread(XXXX:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Opening socket connection to server XXXX/XXXX:2181. Will not attempt to authenticate using SASL (unknown error) 
2024-10-15 09:03:00.326 INFO KyuubiSessionManager-exec-pool: Thread-472-SendThread(XXXX:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Socket connection established to XXXX/XXXX:2181, initiating session 
2024-10-15 09:03:00.330 INFO KyuubiSessionManager-exec-pool: Thread-472-SendThread(XXXX:2181) org.apache.kyuubi.shaded.zoo keeper.ClientCnxn: Session establishment complete on server XXXX/XXXX:2181, sessionid = 0x2a000000874400ad, negotiated timeout = 40000 
2024-10-15 09:03:00.331 INFO KyuubiSessionManager-exec-pool: Thread-472-EventThread org.apache.kyuubi.shaded.curator.framework.state.ConnectionStateManager: State change: CONNECTED 
2024-10-15 09:03:00.371 INFO KyuubiSessionManager-exec-pool: Thread-472 org.apache.kyuubi.engine.ProcBuilder: Creating evm's directory working at /opt/kyuubi/work/evm 
2024-10-15 09:03:00.385 INFO KyuubiSessionManager-exec-pool: Thread-472 org.apache.kyuubi.engine.EngineRef: Launching engine: /opt/java/openjdk/bin/java \ -Xmx1g \ -cp /opt/kyuubi/externals/engines/trino/kyuubi-trino-engine_2.12-1.9.1.jar:/opt/kyuubi/externals/engines/trino/* org.apache.kyuubi.engine.trino.TrinoSqlEngine \

I don’t know if there is a configuration error or a problem with the startup method. It doesn’t work when using zk to start according to the documentation.

And the engine started through port 10009, if the corresponding kyuubi pod crashes during operation, the running sql will also fail and will not be transferred. From the above log, the session is connected from zk, Session establishment complete on server XXXX/XXXX:2181, which feels weird. I don't know how to specify the query engine through the jdbc url and achieve HA under the premise of setting kyuubi.engine.share.level=GROUP?

When I deleted the metadata information in zk, I used /opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:2181/;kyuubi.engine.type=SPARK_SQL;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi' -n batch3 -p to start the engine in the kyuubi-0 pod. There were two pods in total, and the other one was kyuubi-1. When I ran a sql, I killed kyuubi-0, and the task was not transferred to kyuubi-1 but failed. I don't know how HA will take effect.

LaunchEngine
2024-10-15 11:02:00.029 INFO KyuubiSessionManager-exec-pool: Thread-41 org.apache.kyuubi.shaded.curator.framework.imps.CuratorFrameworkImpl: Starting
2024-10-15 11:02:00.030 INFO KyuubiSessionManager-exec-pool: Thread-41 org.apache.kyuubi.shaded.zookeeper.ZooKeeper: Initiating client connection, connectString=XXXX:2181,XXXX:2181,XXXX:2181 sessionTimeout=600000 watcher=org.apache.kyuubi.shaded.curator.ConnectionState@f05c18d
2024-10-15 11:02:00.032 INFO KyuubiSessionManager-exec-pool: Thread-41-SendThread(XXXX:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Opening socket connection to server XXXX/XXXX:2181. Will not attempt to authenticate using SASL (unknown error)
2024-10-15 11:02:00.035 INFO KyuubiSessionManager-exec-pool: Thread-41-SendThread(XXXX:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Socket connection established to XXXX/XXXX:2181, initiating session
2024-10-15 11:02:00.039 INFO KyuubiSessionManager-exec-pool: Thread-41-SendThread(XXXX:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Session establishment complete on server XXXX/XXXX:2181, sessionid = 0x2a000000874400b4, negotiated timeout = 40000
2024-10-15 11:02:00.040 INFO KyuubiSessionManager-exec-pool: Thread-41-EventThread org.apache.kyuubi.shaded.curator.framework.state.ConnectionStateManager: State change: CONNECTED
2024-10-15 11:02:00.055 WARN KyuubiSessionManager-exec-pool: Thread-41 org.apache.hadoop.security.ShellBasedUnixGroupsMapping: unable to return groups for user batch3
PartialGroupNameException The user name 'batch3' is not found. id: ‘batch3’: no such user
id: ‘batch3’: no such user

        at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.resolvePartialGroupNames(ShellBasedUnixGroupsMapping.java:294)
        at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:207)
        at org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:97)
        at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.getGroups(JniBasedUnixGroupsMappingWithFallback.java:51)
        at org.apache.hadoop.security.Groups$GroupCacheLoader.fetchGroupList(Groups.java:387)
        at org.apache.hadoop.security.Groups$GroupCacheLoader.load(Groups.java:321)
        at org.apache.hadoop.security.Groups$GroupCacheLoader.load(Groups.java:270)
        at org.apache.hadoop.thirdparty.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3529)
        at org.apache.hadoop.thirdparty.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2278)
        at org.apache.hadoop.thirdparty.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2155)
        at org.apache.hadoop.thirdparty.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2045)
        at org.apache.hadoop.thirdparty.com.google.common.cache.LocalCache.get(LocalCache.java:3962)
        at org.apache.hadoop.thirdparty.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3985)
        at org.apache.hadoop.thirdparty.com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4946)
        at org.apache.hadoop.security.Groups.getGroups(Groups.java:228)
        at org.apache.hadoop.security.UserGroupInformation.getGroups(UserGroupInformation.java:1755)
        at org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1743)
        at org.apache.kyuubi.session.HadoopGroupProvider.groups(HadoopGroupProvider.scala:36)
        at org.apache.kyuubi.session.HadoopGroupProvider.primaryGroup(HadoopGroupProvider.scala:33)
        at org.apache.kyuubi.engine.EngineRef.<init>(EngineRef.scala:97)
        at org.apache.kyuubi.session.KyuubiSessionImpl.engine$lzycompute(KyuubiSessionImpl.scala:85)
        at org.apache.kyuubi.session.KyuubiSessionImpl.engine(KyuubiSessionImpl.scala:78)
        at org.apache.kyuubi.session.KyuubiSessionImpl.renewEngineCredentials(KyuubiSessionImpl.scala:269)
        at org.apache.kyuubi.session.KyuubiSessionImpl.engineCredentials$lzycompute(KyuubiSessionImpl.scala:76)
        at org.apache.kyuubi.session.KyuubiSessionImpl.engineCredentials(KyuubiSessionImpl.scala:76)
        at org.apache.kyuubi.session.KyuubiSessionImpl.$anonfun$openEngineSession$2(KyuubiSessionImpl.scala:136)
        at org.apache.kyuubi.session.KyuubiSessionImpl.$anonfun$openEngineSession$2$adapted(KyuubiSessionImpl.scala:133)
        at org.apache.kyuubi.ha.client.DiscoveryClientProvider$.withDiscoveryClient(DiscoveryClientProvider.scala:36)
        at org.apache.kyuubi.session.KyuubiSessionImpl.$anonfun$openEngineSession$1(KyuubiSessionImpl.scala:133)
        at org.apache.kyuubi.session.KyuubiSession.handleSessionException(KyuubiSession.scala:49)
        at org.apache.kyuubi.session.KyuubiSessionImpl.openEngineSession(KyuubiSessionImpl.scala:133)
        at org.apache.kyuubi.operation.LaunchEngine.$anonfun$runInternal$1(LaunchEngine.scala:60)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
2024-10-15 11:02:00.060 WARN KyuubiSessionManager-exec-pool: Thread-41 org.apache.kyuubi.session.HadoopGroupProvider: There is no group for batch3, use the client user name as group directly
2024-10-15 11:02:00.106 WARN KyuubiSessionManager-exec-pool: Thread-41 org.apache.kyuubi.shaded.curator.utils.ZKPaths: The version of ZooKeeper being used doesn't support Container nodes. CreateMode.PERSISTENT will be used instead.
2024-10-15 11:02:00.168 INFO KyuubiSessionManager-exec-pool: Thread-41 org.apache.kyuubi.engine.ProcBuilder: Creating batch3's working directory at /opt/kyuubi/work/batch3
2024-10-15 11:02:00.178 INFO KyuubiSessionManager-exec-pool: Thread-41 org.apache.kyuubi.Utils: Loading Kyuubi properties from /opt/spark/conf/spark-defaults.conf
2024-10-15 11:02:00.185 INFO KyuubiSessionManager-exec-pool: Thread-41 org.apache.kyuubi.engine.ProcBuilder: Logging to /opt/kyuubi/work/batch3/kyuubi-spark-sql-engine.log.0
2024-10-15 11:02:00.195 INFO KyuubiSessionManager-exec-pool: Thread-41 org.apache.kyuubi.engine.EngineRef: Launching engine:
/opt/kyuubi/externals/spark-3.5.1-bin-hadoop3/bin/spark-submit \
        --class org.apache.kyuubi.engine.spark.SparkSQLEngine \

2024-10-15 10:55:07.877 INFO KyuubiSessionManager-exec-pool: Thread-50 org.apache.kyuubi.operation.ExecuteStatement: Query[1ed9cfe3-d749-4cae-a4ef-b55b97a20e7b] in RUNNING_STATE

 2024-10-15 10:55:12.883 INFO KyuubiSessionManager-exec-pool: Thread-50 org.apache.kyuubi.operation.ExecuteStatement: Query[1ed9cfe3-d749-4cae-a4ef-b55b97a20e7b] in RUNNING_STATE

 2024-10-15 10:55:17.885 INFO KyuubiSessionManager-exec-pool: Thread-50 org.apache.kyuubi.operation.ExecuteStatement: Query[1ed9cfe3-d749-4cae-a4ef-b55b97a20e7b] in RUNNING_STATE

 2024-10-15 10:55:22.888 INFO KyuubiSessionManager-exec-pool: Thread-50 org.apache.kyuubi.operation.ExecuteStatement: Query[1ed9cfe3-d749-4cae-a4ef-b55b97a20e7b] in RUNNING_STATE

 2024-10-15 10:55:27.891 INFO KyuubiSessionManager-exec-pool: Thread-50 org.apache.kyuubi.operation.ExecuteStatement: Query[1ed9cfe3-d749-4cae-a4ef-b55b97a20e7b] in RUNNING_STATE

 2024-10-15 10:55:32.893 INFO KyuubiSessionManager-exec-pool: Thread-50 org.apache.kyuubi.operation.ExecuteStatement: Query[1ed9cfe3-d749-4cae-a4ef-b55b97a20e7b] in RUNNING_STATE

 2024-10-15 10:55:37.723 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-40 org.apache.kyuubi.server.KyuubiTBinaryFrontendService: Session [SessionHandle [21f855e3-bd5d-4c38-8c3c-e2c9348bff6b]] disconnected without closing properly, close it now

 2024-10-15 10:55:37.724 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-40 org.apache.kyuubi.session.KyuubiSessionManager: batch3's KyuubiSessionImpl with SessionHandle [21f855e3-bd5d-4c38-8c3c-e2c9348bff6b] is closed, current opening sessions 0

 2024-10-15 10:55:37.725 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-40 org.apache.kyuubi.operation.LaunchEngine: Processing batch3's query[7734f7e2-fffd-4ebb-b382-0fa65f500c9f]: FINISHED_STATE -> CLOSED_STATE, time taken: 134.324 seconds

 2024-10-15 10:55:37.728 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-40 org.apache.kyuubi.operation.ExecuteStatement: Processing batch3's query[1ed9cfe3-d749-4cae-a4ef-b55b97a20e7b]: RUNNING_STATE -> CLOSED_STATE, time taken: 35.028 seconds

 2024-10-15 10:55:37.919 INFO KyuubiTBinaryFrontendHandler-Pool: Thread-40 org.apache.kyuubi.client.KyuubiSyncThriftClient: TCloseOperationReq(operationHandle:TOperationHandle(operationId:THandleIdentifier(guid:1E D9 CF E3 D7 49 4C AE A4 EF B5 5B 97 A2 0E 7B, secret:C2 EE 5B 97 3E A0 41 FC AC 16 9B D7 08 ED 8F 38), operationType:EXECUTE_STATEMENT, hasResultSet:true)) succeed on engine side

Affects Version(s)

1.9.1

Kyuubi Server Log Output

LaunchEngine
2024-10-14 16:30:51.009 INFO KyuubiSessionManager-exec-pool: Thread-57 org.apache.kyuubi.shaded.curator.framework.imps.CuratorFrameworkImpl: Starting
2024-10-14 16:30:51.010 INFO KyuubiSessionManager-exec-pool: Thread-57 org.apache.kyuubi.shaded.zookeeper.ZooKeeper: Initiating client connection, connectString=XXXX:2181,XXXX:2181,XXXX:2181 sessionTimeout=600000 watcher=org.apache.kyuubi.shaded.curator.ConnectionState@2b29b5ff
2024-10-14 16:30:51.012 INFO KyuubiSessionManager-exec-pool: Thread-57-SendThread(XXXX:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Opening socket connection to server XXXX/XXXX:2181. Will not attempt to authenticate using SASL (unknown error)
2024-10-14 16:30:51.019 INFO KyuubiSessionManager-exec-pool: Thread-57-SendThread(XXXX:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Socket connection established to XXXX/XXXX:2181, initiating session
2024-10-14 16:30:51.024 INFO KyuubiSessionManager-exec-pool: Thread-57-SendThread(XXXX:2181) org.apache.kyuubi.shaded.zookeeper.ClientCnxn: Session establishment complete on server XXXX/XXXX:2181, sessionid = 0x2a0000008744009e, negotiated timeout = 40000
2024-10-14 16:30:51.024 INFO KyuubiSessionManager-exec-pool: Thread-57-EventThread org.apache.kyuubi.shaded.curator.framework.state.ConnectionStateManager: State change: CONNECTED
2024-10-14 16:30:51.062 INFO KyuubiSessionManager-exec-pool: Thread-57 org.apache.kyuubi.Utils: Loading Kyuubi properties from /opt/spark/conf/spark-defaults.conf
2024-10-14 16:30:51.064 INFO KyuubiSessionManager-exec-pool: Thread-57 org.apache.kyuubi.engine.ProcBuilder: Logging to /opt/kyuubi/work/evm/kyuubi-spark-sql-engine.log.1
2024-10-14 16:30:51.065 INFO KyuubiSessionManager-exec-pool: Thread-57 org.apache.kyuubi.engine.EngineRef: Launching engine:
/opt/kyuubi/externals/spark-3.5.1-bin-hadoop3/bin/spark-submit \
        --class org.apache.kyuubi.engine.spark.SparkSQLEngine \

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

pan3793 commented 1 month ago

that's the advantage of spark cluster mode -launching the spark engine (driver) process in a dedicated pod, instead of the pod that runs spark-submit command. unfortunately, cluster mode is only applicable for the spark engine so far.

if you set spark.submit.deployMode=cluster by either modifying spark-defaults.conf or kyuubi-defaults.conf, then run a beeline to trigger a spark-submit, a new Pod for spark driver will be launched, then deleting the kyuubi pod won't terminate the spark engine

A-little-bit-of-data commented 1 month ago

if you set spark.submit.deployMode=cluster by either modifying spark-defaults.conf or kyuubi-defaults.conf, then run a beeline to trigger a spark-submit, a new Pod for spark driver will be launched, then deleting the kyuubi pod won't terminate the spark engine

I set spark.submit.deployMode=cluster through the command. It is true that after deleting the kyuubi pod, the spark engine will not be terminated, but the running SQL will fail. My understanding of HA should be that when this kyuubi pod hangs up, the running SQL can be transferred to another kyuubi pod so as not to affect the normal operation of SQL. Otherwise, how can it be called HA? In addition, when submitting tasks through the way of connecting to zk in the document, /opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:2181/;kyuubi.engine.type=SPARK_SQL;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi' -n batch3 -p, the engine cannot be specified, and only the default one configured in kyuubi-defaults.conf can be used; Also, when JDBC authentication is enabled, the same user can connect through the kyuubi beeline client, but through dolphin The Spark and CDH Beeline clients under the scheduler cannot connect, and keep reporting errors: 15:09:41.002 [main] ERROR org.apache.hive.jdbc.Utils - Unable to read HiveServer2 configs from ZooKeeper Unknown HS2 problem when communicating with Thrift server. Error: Could not open client transport for any of the Server URI's in ZooKeeper: Peer indicated failure: Error validating the login (state=08S01,code=0)

pan3793 commented 1 month ago

My understanding of HA should be that when this kyuubi pod hangs up, the running SQL can be transferred to another kyuubi pod so as not to affect the normal operation of SQL.

Not exactly. In such cases, all sessions associated with the dead Kyuubi Pod will be marked as invalid and all queries will be canceled, the client needs to catch the exception and re-run the failed queries.

pan3793 commented 1 month ago

kyuubi.engine.type=SPARK_SQL should be put after # or ?

A-little-bit-of-data commented 1 month ago

My understanding of HA should be that when this kyuubi pod hangs up, the running SQL can be transferred to another kyuubi pod so as not to affect the normal operation of SQL.

Not exactly. In such cases, all sessions associated with the dead Kyuubi Pod will be marked as invalid and all queries will be canceled, the client needs to catch the exception and re-run the failed queries.

Does it mean that even if HA is enabled through zk, if one of the kyuubi pods fails for any reason, the SQL being executed will fail?

pan3793 commented 1 month ago

@A-little-bit-of-data mixing code and words make your comments hard to read, please learn the markdown syntax to make your article pretty.

pan3793 commented 1 month ago

Does it mean that even if HA is enabled through zk, if one of the kyuubi pods fails for any reason, the SQL being executed will fail?

Yes, but in practice, the Kyuubi is a pretty stable service given we shift most of the load to the engine side.

A-little-bit-of-data commented 1 month ago

kyuubi.engine.type=SPARK_SQL should be put after # or ?

In the old version (1.7.0), kyuubi.engine.type=SPARK_SQL, FLINK_SQL,TRINO was used like this. But it is no longer possible in 1.9.1, so I commented them out, which is equivalent to using the default SPARK_SQL.

pan3793 commented 1 month ago

it is no longer possible in 1.9.1

can you elaborate more? do you mean content after # will be sliently ignored? it may caused by your shell(bash or sh), try quote the JDBC URL by ' or ".

A-little-bit-of-data commented 1 month ago

it is no longer possible in 1.9.1

can you elaborate more? do you mean content after # will be sliently ignored? it may caused by your shell(bash or sh), try quote the JDBC URL by ' or ".

Yes, the content after # is commented out. The jdbc url I use is similar to this: /opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:2181/;kyuubi.engine.type=SPARK_SQL;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi' -n batch3 -p Is there any problem?

pan3793 commented 1 month ago

this should not work.

jdbc:hive2://XXXX:2181/;kyuubi.engine.type=SPARK_SQL;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi

this should work.

jdbc:hive2://XXXX:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi#kyuubi.engine.type=SPARK_SQL
A-little-bit-of-data commented 1 month ago

this should not work.

jdbc:hive2://XXXX:2181/;kyuubi.engine.type=SPARK_SQL;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi

this should work.

jdbc:hive2://XXXX:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi#kyuubi.engine.type=SPARK_SQL

You are right. I changed it back according to the document. However, when using this jdbc:hive2://XXXX:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi#kyuubi.engine.type=TRINO, it still uses the sparksql engine by default and does not start a trino engine.

kyuubi-defaults.conf
## Helm chart provided Kyuubi configurations
#kyuubi.engine.type=SPARK_SQL, FLINK_SQL, CHAT, TRINO, HIVE_SQL, JDBC
#kyuubi.engine.type=TRINO
kyuubi.kubernetes.namespace=dfmsjzt-test
kyuubi.frontend.connection.url.use.hostname=false
kyuubi.frontend.thrift.binary.bind.port=10009
kyuubi.frontend.thrift.http.bind.port=10010
kyuubi.frontend.rest.bind.port=10099
kyuubi.frontend.mysql.bind.port=3309
kyuubi.frontend.protocols=REST,THRIFT_BINARY
kyuubi.session.engine.check.interval=PT1M
kyuubi.session.engine.idle.timeout=PT30M
kyuubi.kubernetes.terminatedApplicationRetainPeriod=PT1M
kyuubi.engine.share.level=GROUP
kyuubi.authentication=JDBC
pan3793 commented 1 month ago

have you tried to quote the JDBC URL by exact "? not '

A-little-bit-of-data commented 1 month ago

have you tried to quote the JDBC URL by exact "? not '

Thank you very much for your patient answer. I just realized the difference between the two. jdbc:hive2://XXXX:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi;kyuubi.engine.type=SPARK_SQL

jdbc:hive2://XXXX:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi#kyuubi.engine.type=SPARK_SQL

Currently, you can switch engines through #kyuubi.engine.type=SPARK_SQL, which is great, but I still want HA to be implemented. Even if the kyuubi pod fails for any reason, the running SQL task can be transferred to another pod through HA, rather than the SQL task failing directly, but the spark SQL engine is still there.

pan3793 commented 1 month ago

I understand, but this is quite a big story, we may need to store the session/operation state in an external storage, i.e. Redis, MySQL, ZooKeeper, instead of a memory hash map, to achieve a real "distributed session"

A-little-bit-of-data commented 1 month ago

I understand, but this is quite a big story, we may need to store the session/operation state in an external storage, i.e. Redis, MySQL, ZooKeeper, instead of a memory hash map, to achieve a real "distributed session"

Thank you very much for your patience. Of course, this is a very large project and I look forward to the updates of subsequent versions.