Closed zhouyifan279 closed 1 year ago
This would be helpful for applying {OWNER} to policies.
But two problems for considering ,
@bowenliang123 , thanks for your comment. Here are my thoughts about your questions:
- is it investigating the owner for all tables evn with no {OWNER} rules on them? This will cause a heavy CPU/RTT time to fetch this information, and the additional cache will leave more memory footprints to it.
For SQL query (SELECT & DML), we can always get the owner of table from CatalogTable#owner
or org.apache.spark.sql.connector.catalog.Table#properties.get("owner")
. No extra fetch is introduced.
For most SQL commands (DDL), table metadata is not fetched during SQL complie. We need to fetch table metadata.
In most cases, only one table metadata is fetched, there should not be much CPU/RTT overhead.
2. what is the proper caching and evicting strategy for caching table owners? LTT or max cache counts will introduce worries for missed queries, whether they will be fetched again which could cause more action and load in 1.
Originally, I intended to cache table metadata because I wanted to fetch table metadata of DataSourceV2Relation
.
After deeper investigation, I found DataSourceV2Relation
carrying table metadata in table
field. So cache is not needed anymore.
Thanks for the investigation and explanation. Since the owner name is carried in CatalogTable
in V1 and in table
in V2Realtion, it's alright to use them without extra fetching action. And as of now we don't have to cache them in runtime, that's good enough for concrete implementation.
@zhouyifan279
Code of Conduct
Search before asking
Describe the proposal
Currently, if user does not have insert permission of all tables in a database, AccessControlException will be throw when user insert into a newly created table.
RangerBasePlugin uses {OWNER} variable to remove this limitation:
At Ranger Admim side, set {OWNER} variable in Ranger Policy
Users
field.At Ranger Plugin side, due to the above policy, current user gets the specified permissions of any table he creates(owns). Ranger Plugin deals with {OWNER} variable in
org.apache.ranger.plugin.policyevaluator.RangerDefaultPolicyItemEvaluator#matchUserGroupAndOwner
:Some works need to be done to support this feature in Kyuubi RangerSparkExtension.
Task list
#Get table owner by TableIdentifier#Cache table info to reduce Catalog method invocationsAre you willing to submit PR?