apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://gravitino.apache.org
Apache License 2.0
1k stars 313 forks source link

[Improvement] Spark Connector Need DelegationTokenProvider for k8s deployment #3297

Open theoryxu opened 5 months ago

theoryxu commented 5 months ago

What would you like to be improved?

When deploying a spark application on k8s and then connecting multiple Hive Metastore (cluster mode), the spark needs DelegationTokenProvider to get delegate tokens from different HMS in the submitting stage and store them at the UserGroupInformation so that the spark driver can communicate with HMS.

For example, KyuubiHiveConnector contains the KyuubiHiveConnectorDelegationTokenProvider to deal with this case.

Now, the Gravitino Spark Connector depends on the KyuubiHiveConnector. However, the KyuubiHiveConnectorDelegationTokenProvider filters the catalog's implementation, which doesn't work in the above case. In addition, It is only for the hive catalog, not including the iceberg catalog.

The Gravitino Spark Connector needs its DelegationTokenProvider to handle this case and ensure it works well in both hive and iceberg catalogs under a Kerberos environment.

REF: https://github.com/apache/kyuubi/blob/master/extensions/spark/kyuubi-spark-connector-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/KyuubiHiveConnectorDelegationTokenProvider.scala

https://github.com/apache/kyuubi/pull/4560

How should we improve?

No response

FANNG1 commented 5 months ago

cc @danhuawang ,could our test cluster cover the scene?