Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.86k stars 2.94k forks source link

hdfs under filesystem not support impersonate #14867

Open gjhkael opened 2 years ago

gjhkael commented 2 years ago

Now, alluxio request hdfs is use the user that login from kerberos, it will leads to HDFS permission leak for alluxio aways use super user to access hdfs. So if hdfs under filesystem support impersonate, it seems solve the problem。

apc999 commented 2 years ago

@gjhkael can you provide more details of this ticket? also will this blog help? https://www.alluxio.io/blog/alluxio-developer-tip-why-am-i-seeing-the-error-user-yarn-is-not-configured-for-any-impersonation-impersonationuser-foo/

gjhkael commented 2 years ago

@apc999 Thanks for you answer, with the blog's guide, we can pass the ugi's user from hadoop client to alluxio server successful. But alluxio server is aways use the user that configure by the property(alluxio.master.principal) to access the under filesystem hdfs and the principal must super user of hdfs. Now, alluxio server can get the real user from hadoop client, we can pass it to hdfs with some makeovers.

apc999 commented 2 years ago

Are you trying to Kerberos HDFS as Alluxio UFS? this is fully supported in Alluxio enterprise. Not sure if it can work as expected with Alluxio open source edition

gjhkael commented 2 years ago

14919

beinan commented 2 years ago

@gjhkael thank you for the PR!
Looks like you're asking alluxio to take the username defined in alluxio.master.principal as a PROXY user, and then for any ops from alluxio to hdfs, alluxio server should impersonate to the UGI user (the user passed in by hdfs client).

I can see the benefits of this UFS impersonation behavior. As we discussed offline, I'm not sure if it should be an enterprise edition only feature or not. We might need check it with our product manager.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.