Open MLikeWater opened 1 year ago
cc @bowenliang123 @yaooqinn
I don't have a clue how to exclude the metadata tables like history/snapshots
from the table identifier.
As shown above, the table identifier from select * from iceberg_ns.owner_variable.history
is Some(iceberg_ns.owner_variable.history)
.
Whether possible way to check the table is a in Iceberg catalog and then skip the metadata tables.?
The metadata tables are enumerable, maybe we can hardcode convert the metadata tables' permission check to the data table?
The metadata tables are enumerable, maybe we can hardcode convert the metadata tables' permission check to the data table?
Yes, but first how to check the real table is an iceberg one?
@pan3793 @bowenliang123 Thanks for your support. Different data lake technologies may have different metadata tables. It is possible to judge whether it is a Iceberg or Hudi table from the structure of the created table:
use testdb;
show create table iceberg_tbl;
+----------------------------------------------------+
| createtab_stmt |
+----------------------------------------------------+
| CREATE TABLE spark_catalog.testdb.iceberg_tbl (
`id` BIGINT,
`data` STRING)
USING iceberg
LOCATION 'hdfs://cluster1/tgwarehouse/testdb.db/iceberg_tbl'
TBLPROPERTIES(
'current-snapshot-id' = '4900628243476923676',
'format' = 'iceberg/parquet',
'format-version' = '1')
|
+----------------------------------------------------+
why not just grant select privilege to the user who access testdb.iceberg_tbl.history
?
Is this case equivalent to the one that you visit a hive table while you don't have permission to access the HMS table or record, which stores its metadata?
In other words, if we have ALTER privileges to the raw table, we perform ALTER operation on it, and the metadata changes accordingly. This does not mean we need the ALTER privilege to the metadata directly, which results in an ability to falsify critical information.
why not just grant select privilege to the user who access
testdb.iceberg_tbl.history
?
@yaooqinn The Iceberg metadata tables, such as history or snapshots, are not stored in Hive metastore, so they cannot be authorized by ranger.
why not just grant select privilege to the user who access
testdb.iceberg_tbl.history
?
This could be a workaround.
But these tables are more like meta tables
rather than metadata tables
. For querying situations, these derived tables of source tables could be treated as part of table itself, just like the columns.
With further investigation, I think we could tell it's an HistoryTable from an Iceberg table for resolving this.
SparkTable
and HistoryTable
are classes from Iceberg Spark plugin.
For querying situations, these derived tables of source tables could be treated as part of table itself, just like the columns.
yes, this happens when you query the raw table, just like the role that metadata plays when you query a hive one, or indexes, snapshots, etc., which other databases may have.
Personally, for the Iceberg and Hudi storage formats, the permissions should be simplified when accessing the metadata on the table, that is, the permissions to judge the table metadata depend on the permissions of the table. If the table has access permissions, the metadata should have access permissions. In addition, Ranger does not support the metadata of the data lake storage technology.
what's the behavior of Trino/Snowflake(or other popular products)?
Personally, for the Iceberg and Hudi storage formats, the permissions should be simplified when accessing the metadata on the table, that is, the permissions to judge the table metadata depend on the permissions of the table. If the table has access permissions, the metadata should have access permissions. In addition, Ranger does not support the metadata of the data lake storage technology.
agree and we are facing this issue too. maybe we can setup a configuration to decide whether to convert the metadata tables' permission check to the data table or not. saying introducing this as a feature instead of fixing a bug. cc @yaooqinn @pan3793 @bowenliang123
Code of Conduct
Search before asking
Describe the bug
Environment
Spark version:3.2.2 Kyuubi version: apache-kyuubi-1.7.0-SNAPSHOT-bin (master)
Iceberg version: 0.14.1
Perform SQL operations
For the Iceberg table, it is normal to query some metadata information, such as:
Affects Version(s)
1.7.0(master branch)
Kyuubi Server Log Output
No response
Kyuubi Engine Log Output
Kyuubi Server Configurations
Kyuubi Engine Configurations
No response
Additional context
No response
Are you willing to submit PR?