Closed A-little-bit-of-data closed 1 month ago
that's the advantage of spark cluster mode -launching the spark engine (driver) process in a dedicated pod, instead of the pod that runs spark-submit
command. unfortunately, cluster mode is only applicable for the spark engine so far.
if you set spark.submit.deployMode=cluster
by either modifying spark-defaults.conf
or kyuubi-defaults.conf
, then run a beeline
to trigger a spark-submit
, a new Pod for spark driver will be launched, then deleting the kyuubi pod won't terminate the spark engine
if you set
spark.submit.deployMode=cluster
by either modifyingspark-defaults.conf
orkyuubi-defaults.conf
, then run abeeline
to trigger aspark-submit
, a new Pod for spark driver will be launched, then deleting the kyuubi pod won't terminate the spark engine
I set spark.submit.deployMode=cluster through the command. It is true that after deleting the kyuubi pod, the spark engine will not be terminated, but the running SQL will fail. My understanding of HA should be that when this kyuubi pod hangs up, the running SQL can be transferred to another kyuubi pod so as not to affect the normal operation of SQL. Otherwise, how can it be called HA? In addition, when submitting tasks through the way of connecting to zk in the document, /opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:2181/;kyuubi.engine.type=SPARK_SQL;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi' -n batch3 -p, the engine cannot be specified, and only the default one configured in kyuubi-defaults.conf can be used; Also, when JDBC authentication is enabled, the same user can connect through the kyuubi beeline client, but through dolphin The Spark and CDH Beeline clients under the scheduler cannot connect, and keep reporting errors: 15:09:41.002 [main] ERROR org.apache.hive.jdbc.Utils - Unable to read HiveServer2 configs from ZooKeeper Unknown HS2 problem when communicating with Thrift server. Error: Could not open client transport for any of the Server URI's in ZooKeeper: Peer indicated failure: Error validating the login (state=08S01,code=0)
My understanding of HA should be that when this kyuubi pod hangs up, the running SQL can be transferred to another kyuubi pod so as not to affect the normal operation of SQL.
Not exactly. In such cases, all sessions associated with the dead Kyuubi Pod will be marked as invalid and all queries will be canceled, the client needs to catch the exception and re-run the failed queries.
kyuubi.engine.type=SPARK_SQL
should be put after #
or ?
My understanding of HA should be that when this kyuubi pod hangs up, the running SQL can be transferred to another kyuubi pod so as not to affect the normal operation of SQL.
Not exactly. In such cases, all sessions associated with the dead Kyuubi Pod will be marked as invalid and all queries will be canceled, the client needs to catch the exception and re-run the failed queries.
Does it mean that even if HA is enabled through zk, if one of the kyuubi pods fails for any reason, the SQL being executed will fail?
@A-little-bit-of-data mixing code and words make your comments hard to read, please learn the markdown syntax to make your article pretty.
Does it mean that even if HA is enabled through zk, if one of the kyuubi pods fails for any reason, the SQL being executed will fail?
Yes, but in practice, the Kyuubi is a pretty stable service given we shift most of the load to the engine side.
kyuubi.engine.type=SPARK_SQL
should be put after#
or?
In the old version (1.7.0), kyuubi.engine.type=SPARK_SQL, FLINK_SQL,TRINO was used like this. But it is no longer possible in 1.9.1, so I commented them out, which is equivalent to using the default SPARK_SQL.
it is no longer possible in 1.9.1
can you elaborate more? do you mean content after # will be sliently ignored? it may caused by your shell
(bash
or sh
), try quote the JDBC URL by '
or "
.
it is no longer possible in 1.9.1
can you elaborate more? do you mean content after # will be sliently ignored? it may caused by your
shell
(bash
orsh
), try quote the JDBC URL by'
or"
.
Yes, the content after # is commented out. The jdbc url I use is similar to this: /opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:2181/;kyuubi.engine.type=SPARK_SQL;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi' -n batch3 -p Is there any problem?
this should not work.
jdbc:hive2://XXXX:2181/;kyuubi.engine.type=SPARK_SQL;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi
this should work.
jdbc:hive2://XXXX:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi#kyuubi.engine.type=SPARK_SQL
this should not work.
jdbc:hive2://XXXX:2181/;kyuubi.engine.type=SPARK_SQL;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi
this should work.
jdbc:hive2://XXXX:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi#kyuubi.engine.type=SPARK_SQL
You are right. I changed it back according to the document. However, when using this jdbc:hive2://XXXX:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi#kyuubi.engine.type=TRINO, it still uses the sparksql engine by default and does not start a trino engine.
kyuubi-defaults.conf
## Helm chart provided Kyuubi configurations
#kyuubi.engine.type=SPARK_SQL, FLINK_SQL, CHAT, TRINO, HIVE_SQL, JDBC
#kyuubi.engine.type=TRINO
kyuubi.kubernetes.namespace=dfmsjzt-test
kyuubi.frontend.connection.url.use.hostname=false
kyuubi.frontend.thrift.binary.bind.port=10009
kyuubi.frontend.thrift.http.bind.port=10010
kyuubi.frontend.rest.bind.port=10099
kyuubi.frontend.mysql.bind.port=3309
kyuubi.frontend.protocols=REST,THRIFT_BINARY
kyuubi.session.engine.check.interval=PT1M
kyuubi.session.engine.idle.timeout=PT30M
kyuubi.kubernetes.terminatedApplicationRetainPeriod=PT1M
kyuubi.engine.share.level=GROUP
kyuubi.authentication=JDBC
have you tried to quote the JDBC URL by exact "
? not '
have you tried to quote the JDBC URL by exact
"
? not'
Thank you very much for your patient answer. I just realized the difference between the two.
jdbc:hive2://XXXX:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi;kyuubi.engine.type=SPARK_SQL
jdbc:hive2://XXXX:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi#kyuubi.engine.type=SPARK_SQL
Currently, you can switch engines through #kyuubi.engine.type=SPARK_SQL
, which is great, but I still want HA to be implemented. Even if the kyuubi pod fails for any reason, the running SQL task can be transferred to another pod through HA, rather than the SQL task failing directly, but the spark SQL engine is still there.
I understand, but this is quite a big story, we may need to store the session/operation state in an external storage, i.e. Redis, MySQL, ZooKeeper, instead of a memory hash map, to achieve a real "distributed session"
I understand, but this is quite a big story, we may need to store the session/operation state in an external storage, i.e. Redis, MySQL, ZooKeeper, instead of a memory hash map, to achieve a real "distributed session"
Thank you very much for your patience. Of course, this is a very large project and I look forward to the updates of subsequent versions.
Code of Conduct
Search before asking
Describe the bug
After configuring HA in
kyuubi-defaults.conf
and settingkyuubi.engine.share.level
toGROUP
, and setting a spark user,The entirekyuubi-defaults.conf
configuration is as follows:Helm chart provided Kyuubi configurations
Kyuubi Metrics
Then connect through zk,
/opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:2181/;kyuubi.engine.type=SPARK_SQL;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi' -n batch3 -p
/opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:2181/;kyuubi.engine.type=TRINO;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi' -n evm -p
All of the above will be started through the spark SQL engine. Since the evm-related spark configuration is not configured, the startup will report an error:
The following is the query in zk after executing
/opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:2181/;kyuubi.engine.type=TRINO;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi' -n evm -p
. It is obvious thatkyuubi.engine.type=TRINO
is not effective.However, through the kyuubi 10009 port connection,
/opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:10009/?kyuubi.engine.type=SPARK_SQL' -n batch3 -p
/opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:10009/?kyuubi.engine.type=TRINO' -n evm -p
This way, you can start according to the specified engine, and from the log, you can see that the connection is initialized from zkStart the spark SQL engine:
I don’t know if there is a configuration error or a problem with the startup method. It doesn’t work when using zk to start according to the documentation.
And the engine started through port 10009, if the corresponding kyuubi pod crashes during operation, the running sql will also fail and will not be transferred. From the above log, the session is connected from zk, Session establishment complete on server XXXX/XXXX:2181, which feels weird. I don't know how to specify the query engine through the jdbc url and achieve HA under the premise of setting kyuubi.engine.share.level=GROUP?
When I deleted the metadata information in zk, I used
/opt/kyuubi/bin/beeline -u 'jdbc:hive2://XXXX:2181/;kyuubi.engine.type=SPARK_SQL;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi' -n batch3 -p
to start the engine in thekyuubi-0
pod. There were two pods in total, and the other one was kyuubi-1. When I ran a sql, I killed kyuubi-0, and the task was not transferred to kyuubi-1 but failed. I don't know how HA will take effect.Affects Version(s)
1.9.1
Kyuubi Server Log Output
Kyuubi Engine Log Output
No response
Kyuubi Server Configurations
No response
Kyuubi Engine Configurations
No response
Additional context
No response
Are you willing to submit PR?