[Feature] Support hadoop proxy user

wg1026688210 commented 1 year ago

Search before asking

[X] I searched in the issues and found nothing similar.

Motivation

In our production environment, our real-time platform only has one Flink account to access Hadoop, and Hadoop uses Kerberos for security authentication. If a user needs to use another account to write data, we will proxy the Flink account as another account to meet the user's writing needs to their own Hadoop directory.

Solution

We can implement the proxy user functionality in paimon by option of catalog for both HiveCatalog and FileSystemCatalog.

For FileSytemCatalog : we only need modify the HadoopFileIO.

HiveCatalog: we can use dynamic proxy to wrap all methods of HiveClient with ugi.doAs.

Anything else?

No response

Are you willing to submit a PR?

[X] I'm willing to submit a PR!

wg1026688210 commented 1 year ago

I'm willing to submit a PR

wg1026688210 commented 1 year ago

The proxy user feature should be supported by the computing engine. Perhaps the only advantage of this requirement is that it can be separated from the permission to write savepoints. If there are similar requirements, we can reopen the issue and continue the discussion.

apache / paimon