apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.27k stars 911 forks source link

[Feature] Support hadoop proxy user #1310

Closed wg1026688210 closed 1 year ago

wg1026688210 commented 1 year ago

Search before asking

Motivation

In our production environment, our real-time platform only has one Flink account to access Hadoop, and Hadoop uses Kerberos for security authentication. If a user needs to use another account to write data, we will proxy the Flink account as another account to meet the user's writing needs to their own Hadoop directory.

Solution

We can implement the proxy user functionality in paimon by option of catalog for both HiveCatalog and FileSystemCatalog.

For FileSytemCatalog : we only need modify the HadoopFileIO.

HiveCatalog: we can use dynamic proxy to wrap all methods of HiveClient with ugi.doAs.

Anything else?

No response

Are you willing to submit a PR?

wg1026688210 commented 1 year ago

I'm willing to submit a PR

wg1026688210 commented 1 year ago

The proxy user feature should be supported by the computing engine. Perhaps the only advantage of this requirement is that it can be separated from the permission to write savepoints. If there are similar requirements, we can reopen the issue and continue the discussion.