Open shohamyamin opened 3 months ago
cc @nastra
@shohamyamin you might want to take a look at https://iceberg.apache.org/contribute/#what-is-an-improvement-proposal and write up a proposal and then open a DISCUSS thread with this topic on the mailing list
Hi @shohamyamin, if you want i am willing to work with you on writing the proposal??
@amitgilad3 That would be great
Hi @shohamyamin, @amitgilad3 we have been looking for a similar feature to enforce consistent data access policies for row filters and column masks across query engines, I'd like to collaborate with you on this feature.
No problem, always happy to have more people help, this is the initial proposal, please take a look and comment proposal please review @hereisharish @shohamyamin
@amitgilad3, it would be nice to have dynamic column masking, similar to whats offered in Trino, Hive, Spark. This would allow any UDF or function to be applied based on the user or role.
For example, a query like SELECT ssn, name FROM tab1
could be dynamically rewritten to SELECT encrypt(ssn), name FROM tab1
based on the user's permissions, enabling more flexible and secure data access.
While the column masking is primarily focused on the Unified Data Governance, this can also allow to re-write column using any UDF or function.
Feature Request / Improvement
Summary:
We would like to request the addition of a new feature in the Iceberg REST catalog that would allow the catalog to return a row filter expression for a table and a column mask expression for each column.
Rationale:
This feature would enable query engines, such as Trino and Spark, to obtain crucial information from the catalog regarding how to handle requested resources. Specifically, it would inform the engines if any filtering or masking is required when accessing the data, ensuring that sensitive information is appropriately protected and that data access policies are consistently enforced.
Proposed Implementation:
Row Filter Expression: For each table, the REST catalog should be able to return an expression that defines the rows that should be visible to the querying entity. Column Mask Expression: For each column, the REST catalog should return an expression that defines how the column's data should be masked before it is made available to the query engine. Benefits:
Consistency Across Engines: By centralizing the row filtering and column masking logic in the catalog, all supported query engines (Trino, Spark, etc.) will handle data access uniformly, reducing the risk of inconsistencies.
Security: This feature enhances data security by ensuring that sensitive data is filtered or masked before being accessed by different query engines.
Simplified Data Governance: It simplifies the enforcement of data governance policies by allowing them to be defined once in the catalog and applied consistently across all query engines.
Conclusion:
Implementing this feature would greatly improve the integration of Iceberg with various query engines by providing a standardized way to enforce data access policies. We believe this would be a valuable addition to the Iceberg ecosystem and would help drive broader adoption of Iceberg as a unified data platform.
Query engine
None
Willingness to contribute