apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.09k stars 910 forks source link

[Bug] kyuubi-1.7 can't use mutil metastores #5181

Open tomfans opened 1 year ago

tomfans commented 1 year ago

Code of Conduct

Search before asking

Describe the bug

the case is after configuring iceberg metastore which is different from hive metastore(means i have two different hive metastores), when connecting iceberg catalog. metastore can't be connected cause delegation token expire errors.

Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: token expired or does not exist: HIVE_DELEGATION_TOKEN owner=hive, renewer=hive, realUser=hive/hostxxxxxxxxxxxx, issueDate=1692543549564, maxDate=1693148349564, sequenceNumber=67, masterKeyId=1
        at org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:114)
        at org.apache.hadoop.hive.thrift.TokenStoreDelegationTokenSecretManager.retrievePassword(TokenStoreDelegationTokenSecretManager.java:56)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.getPassword(HadoopThriftAuthBridge.java:565)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$SaslDigestCallbackHandler.handle(HadoopThriftAuthBridge.java:596)
        at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)

why i said this is bug, since when i use original spark-sql command, it works fine.

here examples:

spark-sql (default)> 
                   > use hive_prod;
spark-sql (default)> 
                   > 
                   > show databases;
default
Time taken: 0.689 seconds, Fetched 1 row(s)
spark-sql (default)> 

but kyuubi-1.7 failed.

Affects Version(s)

1.7

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

github-actions[bot] commented 1 year ago

Hello @tomfans, Thanks for finding the time to report the issue! We really appreciate the community's efforts to improve Apache Kyuubi.

pan3793 commented 1 year ago

The differences may come from Spark client/cluster mode.

spark-sql and spark-shell only support client mode, while Kyuubi supports both client and cluster mode, there are some differences in the kerberized cluster.

By default, Kyuubi uses --proxy-user instead of --principal and --keytab, so on

pan3793 commented 1 year ago

Something extended to this issue, Kyuubi implemented DSv2 based Hive connector(a.k.a. KSHC).

And in #4560

... make Kyuubi Spark Hive Connector(KSHC) support kerberized-HMS in cluster mode w/o keytab(which is the typical use case in Kyuubi) by implementing a HadoopDelegationTokenProvider.

There are some notable tricks

  1. spark-sql has some inconsistent behaviors on HiveClient initialization, which makes inconsistent behavior when you using spark-sql for testing. Jar-based Spark application, spark-shell and beeline + Kyuubi work well.
  2. we must set different hive.metastore.token.signature for different HMS to distinguish the delegation tokens, otherwise the latter will overwrite the former. In #4560, we use the metastore uri as the signature for KSHC catalog if hive.metastore.token.signature is not set explicitly. So technically, to allow Iceberg to use different kerberized-HMS, you can register an additional KSHC catalog, and make sure they use the same metastore uri and signature, thus they can share the delegation token.